I interned with the Data Platform team at AppD’s Bangalore office in 2018. The team is responsible for ingesting and storing the metrics sent by agents running on client servers. They process millions of metrics per minute on their staging server.
When I first arrived, the tech stack that the team was using, and even Java, was new to me. The code base was pretty large and I was lost in figuring out non-essential details for quite some time. There were several smaller pain points as well. The indentation used by YAML is particularly unforgiving. As an intern, it’s not possible to have the relevant context of all the libraries you are making changes to or using. There was a certain amount of delay that I had to go through to get access to staging machines, help from different teams, etc. which I wasn’t expecting. My mentor was effective at quickly debugging errors, which helped me realize the value that he brings to his work and that he did for me during my internship.
My Project
The figure above shows the architecture maintained by the Data Platform team. Metric data packets are sent by various agents on client machines to an API gateway, which publishes the packets to a Kafka queue. A metadata annotator reads packets from this queue, does validation and enriches the data with metadata and then caches the relevant metadata in memory. These caches are warmed up gradually by querying metadata from remote services and databases. After processing a packet, it is published to another queue. If packets aren’t processed within specified SLA, they are dropped.
Since all the metadata caches in the annotator are memory based, they have to warm up again during partition rebalancing or service restarts. The throughput dips during this warmup and we miss SLA for a brief period. While there can be different approaches to fix the SLA drop, we took inspiration from state management solutions used in stream processing. We decided to use RocksDB as a persistent cache for keeping the metadata. We would replace an in-memory cache with RocksDB and achieve existing steady-state performance. A secondary goal was to reduce the memory footprint of the annotator.
Implementation: Making Changes Locally
We first wrote a wrapper over RocksJava, a JNI driver of RocksDB. Based on the metadata we used different serialization primitives. RocksDB contains over 120 tunable parameters hence we initialized the parameters based on the implementation given here. An inbuilt TTL feature was used to ensure the eviction of entries. We prioritized making the least possible modification to existing code for ease of testing. On doing a code flow analysis of how caching was done, we realized that all the existing caches implemented a Guavas interface. Hence, we decided to implement the same interface and added a boolean flag to decide whether a particular cache would be flash-based or not. We added unit tests as needed and did local testing. We then published a docker image for our changes. Now we created a new helm chart to test our changes in staging.
The existing metadata annotator was stateless. We created a new service (λ) which would have the same functionality as the metadata annotator while being stateful. We enforced a rule for scheduling in Kubernetes that a single node is scheduled only a single podof λ using node affinity and anti-affinity rules. A separate Kafka consumer group was created for the pods of λ so that λ would read the same data read by the metadata annotator. We changed the topic name that λ produced to so that relays weren’t affected. We now deployed λ in the staging cluster.
Up to this point, I had only encountered minor errors. However, after deploying λ on staging, several issues came up. First, there was an increasing lag. Second, we saw higher than expected CPU usage. A single pod of λ had up to six times the CPU usage versus the existing metadata annotator. We decided to dig deeper into the RocksDB parameters to optimize it further. All of this took up half of my internship. The other half, I spent improving the performance of λ.
Performance Tuning
We started out by analyzing what type of workload we required from RocksDB: read or write-focused or a mix. We realized we have a read heavy workload with infrequent writes. Also, the size of data to be written was in the order of KB’s. After reading up on RocksDB internals, we made changes to optimize for our workload. Bloom filters were enabled in RocksDB and always stored in memory. Additionally, instead of using Binary search to find the exact location within a file, we started using a Hashtable. We also disabled compression, checksum checks, write-ahead logging, and statistics gathering. We increased the RocksDB in-memory cache size for frequently accessed data from SSD by two orders of magnitudes. All these changes reduced CPU usage and reduced read amplification. However, we were still getting approximately 70% performance compared to the existing metadata annotator. To improve our performance further, we analyzed λ using various profiling tools.
We used various profiling tools such as Perf, Oa (internal to AppD) and thread dumps to find out where most of the CPU time was being spent, which code flows were taking the most amount of time, and what exceptions were being generated.
From the profiling and code flow analysis, we realized that there was a mistake in our serialization/deserialization procedure. Through looking at code flow for the calls that were taking the most time, we were able to observe that λ was using the Java (de)serializer all the time, which should not have been the case since the data types we were trying to cache were either Protobuf objects or Java primitives. Through analyzing the thread dumps and Perf reports, we saw that a significant amount of CPU was being used on (de)serialization. After correcting this error, we were able to achieve the following performance metrics:
λ Performance Increases
λ was able to get comparable performance to the existing metadata annotator. The above figure shows the single packet processing time for both the annotator (mentioned above as DIS-Metric-Processor) and λ (mentioned as DIS-Metric-Processor-RocksDB). Each pod of either service was serving requests of 10 million metrics per minute. For a single pod, the CPU usage of λ still averaged 3.5 times of the annotator (down from 6), but we were able to reduce the memory usage to half. Similar to Kafka, RocksDB was able to make good use of the Page cache of the OS in improving its performance.
Changes That Didn’t Work Out
The staging cluster only contained network attached SSD’s. We wanted to get an idea of how the service λ fares in a scenario where the pod would have access to a local NVMe based SSD. Although there was an existing cluster in AWS, barely any packets were flowing through it. I would have to synthetically generate the required amount of packets and set up other related components. Getting access, setting up and making these components work in cohesion proved to be quite a challenge, especially since I didn’t know how they interacted with each other.
I faced an endless stream of errors in the various components. This experience made me appreciate the work that the operations team does to ensure that every component is working. After analyzing the hardware configuration of the staging cluster, I felt that reducing the RAM usage should not have been a priority at all and we should have tried out memory based RocksDB, where only a log is stored on disk. However, I was unable to try this out.
I had come into the internship expecting a hackathon type of environment. On the contrary, It was a far more structured environment, with thorough testing to avoid issues during production. I was pleasantly surprised to find that people weren’t stressed about the work but rather focused on doing things the right way. I got a real feel for work-related challenges in the software industry, which was a huge plus for me. Finally, a good deal of time was spent on discussions and ironing out the pros and cons of any proposed solution, before moving out to the implementation.
Despite being new to the technology stack, I was never left to the wolves. After doing my own debugging on any issue that cropped up, I was able to approach the other team members. All the team members had the technical knowledge to bail me out when needed.
I also got a chance to participate in a lot of fun activities at the office. This picture was taken from a parody of Powerpuff Girls, where I played Professor Utonium. I also got a chance to socialize with my colleagues at work and through team outings and social events where I got a glimpse into my colleagues’ non-technical lives.
Interning at AppD was a great learning experience, both in regards to how the industry functions and learning technologies. The stress was very manageable and the team members were considerate in helping out when needed. There were a lot of recreational activities that happened during my time as an intern. As my internship came to a close, I was offered a job at AppD and have now joined AppD in Bangalore as a full-time employee.