TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
Data / Data Streaming / Databases

How To Manage 45 Billion Client Records With Aerospike

At Aerospike's Real-Time Data Summit last week, Adjust's Bubunyo Nyavo explained how the company used Aerospike to help clients track the return on investment of their marketing channels.
Jul 1st, 2024 6:51am by
Featued image for: How To Manage 45 Billion Client Records With Aerospike
Images from Aerospike’s Real-Time Data Summit. 

When your operations outgrow the capabilities of a single database, what are your options?

For the Berlin-based mobile measurement service provider Adjust, the answer came with Aerospike, a real-time, high-performance NoSQL key-value store that can be run across multiple data centers.

At Aerospike’s Real-Time Data Summit last week, Adjust Senior Software Engineer Bubunyo Nyavo, explained how the company used Aerospike to help clients track their return on investment of their marketing channels.

Adjust’s service can generate 52 million requests every minute on average. These requests can set off the need for an operation of some sort, such as a query, and, of course, to reconcile state. A customer may post material on Meta, LinkedIn, or some other social media outlet, and the Adjust gathers the number of people who viewed the content and how many clicked on it

“Depending on what operation it is, we fetch some data, we write some data. Sometimes we write in batches, sometimes delete data, and then we return a response for these requests,” Nyavor said.

Overall, the company keeps about 45 billion records in Aerospike, and these are just recording the states of devices. With an average of 512 Bytes per record, these results in 351TB worth of data.

The data is stored in three separate three separate clusters, located in geographically-dispersed data centers. Each cluster has 64 nodes and runs on bare metal, with Gentoo Linux serving as the operating system. Each server has about 400GB of RAM and 16TB of solid-state of NVMe disk space, and a 10 Gigabit network card. Either two or three copies of the data are kept as backup.

“So that if single rack goes offline, it doesn’t send us into a tailspin,” Nyavor said.

A chart showing the average number of devices connecting to Aerospike.

A chart showing the average number of devices connecting to Aerospike.

Beyond Key-Store Values

The Aerospike key-value store was launched in 2009 (originally as CitrusLeaf) and quickly found an audience in the online advertising industry for storing and subsequently analyzing customer cookies at rapid speed.

Subsequent releases expanded the analytics, incorporated batch processing, and introduced secondary indexes and cross-data center replication.

At the Real-Time Data Summit, Aerospike Senior Developer Experience Engineer Art Anderson discussed how Aerospike could also do graph and vector data formats, which can help online shops easily build out recommendation systems.

For Adjust, low latency was critical. Customers wanted data updated as close to real-time as possible. This is a challenge given the cross-cluster communications.

As with any distributed system with duplicate data, Adjust must trade-offs between consistency and availability of the data (two of the three pillars of the CAP Theorem).

In a consistent mode, accurate data will always be delivered, though it may take some time. In an availability-oriented mode, data will be returned to the requester as quickly as possible, though it may not include the most recent changes (as it takes to propagate new data across different clusters).

Operational modes of Aerospike: Consistency and Available.

Operational modes of Aerospike: Consistency and Available.

“You will get fast responses but there’s no guarantee on the freshness of the data,” Nyavor explained, especially since Adjust writes a lot more data to disk than reads it.

There are several tools that help. Aerospike offers an intelligent client driver that knows which nodes on a cluster to send the requests to. The database system also allows Adjust to store secondary indexes on the speedy solid-state hard drives, an advantage given that it would be cost-prohibitive to store them on the server’s own main memory.

“Aerospike does sufficiently well to be able to help us take advantage of cheaper hardware,” Nyavor said.

Overall, the system can do, on average about 1.2 million write operations per second, and 2 million get operations per second.

Aerospike operations per second at Adjust.

Aerospike operations per second at Adjust.

About 50% of all requests take less than 500 milliseconds or less, an impressive feat given the vastness of the database itself, Nyavor said.

Aerospike operations under 500 milliseconds (Chart).

Aerospike operations under 500 milliseconds.

Scanning is one of the larger operations. It is necessary to delete user records, when requested or when a customer leaves the program. Scanning an entire cluster takes about three days.

“It is a slow and intensive process because it takes a lot of resources to scan,” he said. The good news is that Aerospike can run the scan operations as a background task, temporarily suspending them when reads and writes are needed to be executed.

How Aerospike Is Upgraded

There is still work on Aerospike that needs to be done, according to Nyavor.

For instance, the upgrade process is still pretty manual-intensive.

The process involves going through the change log to ensure nothing has been broken in the upgrade process.

But overall, the database is very configurable, and you need to understand all the options to get the most out of it, Nyavor said.

And if you don’t know something, ask. The Aerospike support team has been really helpful in answering questions, he added.

“Don’t take anything in the documentation that you don’t understand for granted, because it can snowball and bite you in the ass,” he said.

Created with Sketch.
TNS owner Insight Partners is an investor in: Real.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.