Kerf introduces a new data structure called an “Atlas” that we think is a helpful thing to have inside of a programming language. We’ve found it very useful to have database Tables inside of a programming language, and the Atlas concept is a natural extension of that. Be careful about jumping to conclusions, though, since an Atlas does a few special things that make it unlike what you might expect. I’ll give a preview of the major points here:
- Automatic indexing
- Responds to SQL
- First-class language type
- Columnar implementation
We didn’t spend much time talking about Atlases originally, because interest in NoSQL-style computation seemed to be on the wane. Recently, however, we’ve had a lot of requests for those sorts of applications, and so a detailed description seems in order. (If you’ve come across any interesting problems like that in your work, tell us about them.)
If a Table is a way of including a SQL table inside of the language, then an Atlas is a way of including a NoSQL store: none of the rows have to have the same columns. In other words, an Atlas is a schemaless table, or simply a list of JSON objects, with keys in the objects pointing to values. The Atlas is a generalization of the Table where the rows don’t have to be uniform. In one sense, an Atlas is a sparse Table. You could also think of a Table as a vectorized Atlas.
- A regular table
- A JSON array of objects
- Awkward tabular representation of the JSON array
In Kerf we use the term “Map” to refer to what JSON calls an Object {"a":1}. So the term Atlas to refer to a List of Maps (with an Index) was an obvious choice. We use the special convenience syntax {{ and }} to create a table, and the related {[ and ]} convenience syntax to create an atlas. Both of these came about because they fit in well with JSON, and Kerf is a JSON (& SQL) superset.
1. The first thing that makes an Atlas special is that every key is indexed invisibly and by default. This means that no matter what key you query on, your query will be optimized. Atlases let you forget about indexing entirely. In a typical NoSQL store, if you want optimized queries on a key you have to issue the equivalent of a CREATE INDEX command, wait for the index to be created, incur future performance penalties, and so on. With Atlases you don’t have to fool around with any of that. The keys are already indexed, there’s no additional performance penalty, and adding a never-before-seen key doesn’t impact performance. Effectively, you can just forget about indices entirely, even for nested keys.
- A large, (mostly) random atlas
- Already optimized selects on two keys
2. The second thing that makes Atlases interesting is that you can use SQL on them and it will just work. Typical NoSQL stores force you to learn a proprietary query language. It’s a lot easier to be able to use the SQL that you already know, and then to have the output be what you would expect. Even though there’s no specification in SQL for how a NoSQL table should behave, most programmers have an intuitive sense of what should come back. A lot of crazy corner cases have to be handled to accommodate this method of thinking, but because that’s all taken care of on our end (the internals), you don’t have to worry about it at all.
- Sample query
- Multiple renamed columns
- Nested keys
- WHERE clause
- Aggregation, GROUP BY
- Integrated language calculations
3. The third thing that make Atlases special is that the data type is integrated with the language. An Atlas is a first-class data type, just like an int or an array. This means that from inside the language, you can create Atlases, query them, manipulate the results, etc. If you’re querying an old-fashioned database, you need to slurp your results in through the tiny straw of an API. But in Kerf, the language and database are the same, so the code you write works directly on the data. There’s no traversal between a language layer and a database layer. Because the two layers are integrated, so to speak, this means that you can perform “out-of-core” computations without putting special thought into it. It also means that there’s no impact to performance.
- Intermix atlases with the language
- Store and modify selections
4. The fourth thing that makes Atlases special is that they are completely and entirely columnar in order to optimize performance. The index is columnar, the keys are columnar, the values are columnar, everything is columnar. This helps with performance and also with serializing the object.
Because a few existing solutions for schemaless data don’t really perform as advertised, we’ve come across several companies who are suffering in silence with irregular data problems they don’t know how to solve. But there is a tool that exists to solve it, called an Atlas, and we package it in Kerf. If you are faced with a problem with a lot of columns, or ragged or irregular columns, or sparse data, or more than one table mixed together, or have just been disappointed in NoSQL in general, you may want to look into Kerf. Contact us for any reason at founders@kerfsoftware.com.










