What is SlimLib ?
SlimLib is a class library focused on Big Memory applications development under DotNet. Big memory applications uses main memory (RAM) to store and process data sets. Data modifications are done in memory and are stored on disk while reads are mostly done in memory.
One more time, a new in memory NoSQL data store ?
No. SlimLib is not a new in-memory store like Redis, Oracle TimesTen or SAP HANA. SlimLib is a “Real” Big Memory system. “Real” means that objects – class instances – are the data, not a temporary representation of data stored in a separate memory space. For this technical reason, it can be faster than any in memory embedded DBMS or caching systems because there is no more stored data transformation in objects, and objects transformation to storable data. In other words, data are all time instantly avaiable as full featured objects without any pre-processing because objects are physicaly the data.
One more time, a POCO objects persistance ?
No. SlimLib is a low-tech that redefine the way an object is managed in memory because nativelly DotNet is not able to manage billions instances of any class without facing serious garbage collector problems. Using SlimLib, you can do new MyBusinessEntity() billion times. Then, you can add this instances to a persistent, transactionnal, optionaly replicated and persistent, in memory repository and establish billions constrained relationships between this objects. Because it avoid any data transformation (serialization and/or normalization of each object field), SlimLib can saturate fastests SSD disks and networks. From developper point of view, performances are near the one of POCO objects managed in a simple ConcurrentDictionnary<TKey, TValue>.
A hell of complexity for more performances ?
No. SlimLib is easy to use. It introduce a new strategy in persistent objects management with the C# language, to reach new performance level and code simplicity. The intention is to use more power to process core business data, less for data access, transformation and transportation. It reduce the code complexity too, all for a limited cost. To scale, you have simply to add more RAM, CPU, more replicated nodes and a faster LAN Switch. As long as you do not reach the hardware limits, get the same scallability with classical distributed DBMS approach is a lot more expensive and complex. Under a certain point, vertical scaling is costless and a lot easyer than horizontal one. Slimlib can contribute to a new step in Moore law in field of data processing applications performances.
A progressive merge
The last decades, RAM price has fallen. The computer industry is gradually making the merge between working memory (RAM) and storage memory (hard disks). Today, many low cost servers have 8 to 32 Gigabytes of RAM (Random Access Memory, the main memory). A mid-range server often get from 64 to hundreds of Gigabytes of RAM. A lot of production databases are smaller. Many business databases could be managed entirely in RAM in the form of simple C# objects. Business code could be a lot simpler too. Big Memory approach make possible to do more job with fewer servers with a simpler code, which mean be more scalable both in terms of complexity and in term of throughput.
This is SlimLib’s main goal. SlimLib permit the creation, updating and storage of billions of contextualy immutable and persistent objects (instances of classes) in a transactional real-time distributed repository. What make SlimLib new and unique is that objects are the data, not a temporary copy of the data like they usually are. Accessing this objects and their fields values is – in most cases – nearly as fast as accessing to any native DotNet class instance.
Database management systems, such as MS SQL, Oracle, MySQL, PostGresSQL, MongoDB or Redis, have been designed under technical constraints which are becoming less and less relevant. They still have advantages against Big Memory approach : they are generic, sometimes standardized (SQL), instances are independent from applications and are usually shared on a network and there is no delay for pre-loading in RAM. This separation between application and databases had strong benefits. But the drawbacks are real too :
- Strong programming constraints to access and modify datas.
- A limited integration with programming languages despite sophisticated ORMs.
- Writing massive paralelle concurrency data processing code is tricky.
- Poor performances compared to pure in memory object processing.
- They need connectors and transport layers that slow down applications.
- Solid performances is achieved by programming the database engine itself with stored procedures, complex queries or extension assemblies.
Today, developers don’t pay attention to these limits because they have lived with them forever. Benefits seems to be stronger than drawbacks, and performances looks good. Culture is formatted by decades of on disk DBMS uses. There is technologies to reduce the impedance mismatch with OOP (NHibernate, Entity Framework) and strong networks infrastructure to make things stable and decently quick. If you work with Entity Framework, you have a good language integration, but you pay it on performance and memory consumption. If you need performances, you’ll have disseminated code (in database logic written in specific languages), a more complex design and less maintainable and expendable code. Get high performances is an hard work compared to in memory data processing. In the next decade, perception of this constraints may change. With the rise of SSD that permit a 3 GBps preload rate (5 minutes to read one terabyte of data), Big Memory approach will shin for complex, feature rich, scalable applications that process small and mid size data sets, up to terabytes.
A promising approach
Access to main memory is really fast. Latency and throughput are tremendous. On a modern up to date hardware, access to any data is around 1400 time faster in RAM than on a NVMe SSD. Comparison with a 10 ms query over a DBMS (local or shared on a network) is worse : RAM is around 150 000 time faster. Ratios may vary with configuration, of course, but globally you will found the same magnitude in difference between Big Memory applications and DBMS based ones. This is why RAM systems of any kind, even in Big Data, will develop rapidly in the years to come. That’s why all major DBMS vendor company have developed In Memory systems and so much money have been invested in a more robust Redis version.
A new way
SlimLib is a low tech lightweight technology adapted to server applications development that take a new way. In the DotNet world, Linq query language was a huge advancement. Many developers use it with objects collections (Linq to object). Inspired by the functional world, it will evolve to produce better business code. Combined with in memory approach like SlimLib one, code is simpler and applications can be a lot faster. You can enumerate tens millions records per thread per seconds on a commodity server. You don’t have to pay attention to lazzy loading because objects are already here. It avoid the need to deal with the black box of a separated data management system. You simply start the applications, load data as pure objects and process it at lightspeed like any other collection of in memory objects.
First, with the generalization of larger main memory at low cost, SlimLib Big Memory approach could become a serious option to develop any small to mid size server application. Benefits in code simplicity make it attractive not only for real-time performance critical systems, but for any business application. If you manage a set of hot data from few gigabytes to few hundreds gigabytes, SlimLib should be one of the easiest technologies to work with, replacing any MySQL, MS SQL Express, Redis, embedded database engines or in memory caches. You don’t have to learn and masterize another language and system than the one you already now which are C# and Linq.
But SlimLib is a modulare system :
- You can use only the Ghost classes. You can create undred millions of this objects in one process and manage it in standard .Net collections.
- You can use a non persistant repository as a caching system for a classical DBMS, including SQL ones. You can use it as a big cache, and store the updates to a classical RDMS.
- You can use the SlimLib real time replication to make a distributed data or messaging system. This can be volatile or persistant. If you use SlimLib as a persistant messaging system, your messaging system will be crash proof : all your system can brutally stop and reboot without lost messages.
The starting points
Transportation overhead problem
With a separated in memory DBMS or caching system, you have to transport, decode and encode your data in a costly process. Any query must be processed server-side, and the result set is transformed in class instances for the business code. When processed, objects are pushed back to the server-side. The fact is that the business code cannot directly access to data where they are stored. Most of the production business process take only a fraction of the total time needed to process each user event or user command. Most of the processing time is consumed to transport, load and store data. Compute a bill total amount with many rules is fast. Data access latencies make it slow. Developpers must anticipate the needed data directly in the initial query bacause lazy loading is forbidden.
The garbage collector freezes problem
DotNet is great platform, perhaps the most productive and robust one. You can try to create a big repository of POCO (Plain Old C# Objects) in RAM. But the biggest problem is that if you create collections composed of millions of objects with strings and other sub instances references, the Garbage Collector will freeze your process from seconds to minutes during memory collection and compaction. This is not compatible with production availability’s needs. Each time you create a new object, which is really fast, this will cause an deferred analysis to determine if he can be disposed or not. Garbage collector do not love long life objects. More you create instances, slower is your application : under stress tests, your system seems to be anemic while a significant power is used to release tons of intermediate objects and observe long life ones. Garbage Collector tuning is useless. The reality is that if you have a DotNet process that takes few gigabytes of RAM, you are already in danger zone. Having a lot of RAM is useless with DotNet, or a real danger if you try yo use it.
This is a major bottleneck. This problem can be more present in long time running applications servers. Large Object Heap become more and more saturated while the server retain hot data.
The read / write ratio opportunity
Many today applications are more accessing data in read mode than in write mode. The ratio is usually one write for tens or thousands reads. Rendering a page, find data, request a list view or compute analytics are mostly based on read data operations. We often transform stored raw data to object. A programming languages like C# do interact at optimal speed with objects (class instances) and structures that contains fields. Slow down write operations (create instances of classes and change fields values) to have faster access, read, send, write of objects can enhancement the overall system. It permit to push back the code complexity limits.
The multi-processing problem
Today mid range servers have 4 to 16 cores. This fantastic power is hard to exploit. Object oriented programming languages like C# still based on the old single mutation processing paradigm to manage needed data. Mutability is a strong problem which need critical code sections to avoid border effects, critical sections that are so large that they can avoid parallelization benefits too. If functional languages ease the parallelization of data processing, they generate lot of intermediate data structures that must be allocated on the stack to be efficiently processed. The computation result still be stored in a long life on heap memory space, making the application fall in GC problem.
Taking care of this facts, SlimLib is defining global goals :
- Avoid any transportation and transformations of the stored raw data to the object land, and from object land to raw data store, each time a access or mutation of the data occur, like it is done with any database system, even in-memory ones. Native objects must be the data.
- Avoid the garbage collector freezes.
- Exploit the read / write ratio as an opportunity to enhance globally the performances : slowing down mutations operations a little bit to strongly enhance any access operation.
- Ease the writing of concurrent processing code through contextual immutability (not systematic immutability, wich is memory and power consuming).
To reach this goals, SlimLib enhance basics mechanism to push back major DotNet bottlenecks :
- It redefine the way classes instances and their fields are managed to permit to any DotNet application to allocate terabytes of in memory objects without garbage collector freezes. It do not need any custom runtime. It is written in pure portable C# code with large unsafe parts.
- It avoid all processing time dedicated to serialization and permit to have globally cloneable, comparable and contextually immutable classes instances. Objects can be stored, read, sent, compressed at a speed that no serialization technique – even the bestn like protobuf – can approach.
- It ease multiprocessing programming. Each object is or can be fully immutable – himself with all his graph’s sub objects (internal strings, arrays) on the first level.
- It offer a real-life set of feature to both doing dictionary style database (key/value) and graph oriented relational database. It permit to insert, update, find and remove millions objects and objects relations per seconds, as fast as DotNet ConcurrentDictionnary<TKey,T>.
- In SlimLib code base, classes like ConcurrentQueue, Task, Monitor or HashTables are replaced by low contention garbage free versions.
- It support on the fly disk read to limit memory footprint of large objects like blobs or files.
Coming back to native memory
SlimLib manage data “objects” in the process native memory. SlimLib objects are not managed by the Garbage Collector. Because they are unsafe, they need attentions to do not corrupt the process memory and avoid memory leaks. C and C++ developer used to do that. But using SlimLib is not a brutal return to C or C++ memory management paradigm for all. SlimLib permit a mix between the managed world and the unmanaged one. The majority of the application long life data can be in the form of unmanaged memory blocks while processing intermediate objects can benefit from the Garbage Collector mechanisms. This compartments permits to take advantages of both worlds with a decent security level. SlimLib permit to maintains a strong coherency from the ground-up between this two worlds where, in most cases, unmanaged resources are driven by managed ones. For the developper, complexe unmanaged memory management is hidden. Memory management (allocation, release, collect) is done in a deterministic way, driven by the managed world.
In a normal class instance, string and array fields are references to independently allocated objects. SlimLib introduce the concept of objects ghost – or “packed objects”. Ghosts are single block of memory that contains all fields of a given instance of a class : primitive type values fields, strings and arrays are managed in a contiguous single memory block from start to end of the object lifecycle. All fields values are read where they are, in the unmanaged memory space. In most situations, there is no copy or managed objects creation, except for strings – at this time there is no alternative. And if a local copy is done, his life is extremely short, GC collect it at light speed.
To access to a Ghost from the managed world, you need to use a Body class. A Body class contains all code necessary to manage the Ghost fields (read values, write values). Each time you create a Body class instance, a Ghost is created. This Ghost can live without a Body class instance. Most of Ghost had no Body instance attached to. We create Body only to manipulate Ghosts, but almost all Ghost are stored in native memory managed structures that are separated from the managed world.
Advantages of Ghost objects :
- Copying an object and all his fields (including arrays and strings), whatever they are, is simply a matter of allocation memory block and copy. With a DotNet object, you have to use a slower cloner method that copy every fields, strings and arrays individually.
- Compare two objects is done with simple memory block comparison. If the two objects are not the same size, you immediately know that they are not equals. With a .Net object you have to compare each of his fields, strings and arrays one by one.
- You can take “as this” the memory block of the object for persistence or sending it on the network : all strings, arrays and values are already in a serialized format. There is no more serialization process. Send, write and read objects is a lightspeed process.
- You can temporarily compress the overall object at low cost.
- The processor cache swap is lower because all the sub structures of the objects are in the same memory region, usually in the same memory page.
- Make an object immutable is a lot simpler and doesn’t need any lock.
- There is no overhead for value object field access, and a little one for strings or arrays. But it is more efficient to do small local computations than random access the memory.
Drawbacks and limitations :
- SlimLib do not auto-serialize a complete object graph but only the first level of unmanaged fields. But the majority of databases schema are flat design where rows contains primitives, strings and arrays.
- Modifying a variable size field need memory copy and often reallocation.
- Get the value of strings fields should need creation of a new ephemeral short life string object.
- The definition of the objects memory topology (fields alignments) and endianness is captured during persistence operations. If you want to load objects from a file on a platform that do not support alignments and / or endianness of the one which have generate it, you will face problems – data will be unreadable and seems to be corrupted. Actually, majority of hardware will support x64 data structures under DotNet, even ARM architectures.
What mean “contextual immutability” ?
Immutability is the fact that any state change of an instance is done in a new memory space. In C# world, the String class is the most common immutable type. You cannot modify a string, you can only build a new string with olders. Immutability is great when you have many threads that are reading the same data. Any data that is modified do not change the data seen by other thread. In functionnal programming languages, immutability is global, like the String class is in C#. This kind of systemic immutability is not necessary in a data management context nor in symetric processing.
In a data processing system you need only to respect the fact that each thread must access to a stable version of a given data. SlimLib implement something like an MVCC mechanism (Multi Value Concurency Control). The shared state of a instance (the one seen by all threads) is stable because this instance is not linked to any transaction. If a transaction start to modify an instance, this instance is duplicated. While it is modified many times in the same transaction, this instance is not anymore immutable. In SlimLib, this immutability switch is managed at the memory management level. Instance immutability is only contextual to the shared state accross threads.
This kind of immutability is faster than globally immutability established as a general rules in functionnal languages because bacause it do not imply a systematic “copy on write” internal mechanisme. Imagine you have a 1 kb class. If you modify only one byte 4 times, you will generate 4 allocations and 4 kb of memory copy. With SlimLib, if this modifications are done with the same thread in a single transaction, you will only generate 1 allocation and 1 kb of copy.
Note that SlimLib can generate classes with globally immutability if you need that kind of immutability in a particulare context.
The Repository pattern
SlimLib implement a standard Ghost Repository. Each Ghost had meta data : a type, a version of his type and a Guid as identifier. Thanks to GUID, the repository is not the identifier generator. You can create a Body anywhere in the code and manipulate it like a real POCO object : modify his properties, pass it as parameter, store it in collections, etc. When you what to make it persistant, you add its Ghost to a Repository in a transaction and commit the transaction. The repository had a table for each Ghost type and manage relationships between Ghosts.
The repository permits to establish relationships between Ghosts like it is done in a RDBMS with join tables. The relations permit to create lists of Bodies in a Body with various options.
- Constrain relations to define 0-1, 1-1, 1-N, N-N cardinality.
- Enumerate Body’s collections with one Body created for each Ghost, or only one.
- When you delete an object, all his relations are deleted : you cannot have invalid relations.
- Add lifecycle constraints to forbid deletion of objects when linked to particulare relations.
- Define cascading delletion.
- Forbid multiple relation of a given type.
All this contraints are checked during Transaction commits as a consistency check.
All modifications of the Body tables (insert, delete, update of Bodies) and relations between Bodies are done in ACID isolated transactions. All mutations are stored in a log file on disk. Multiple transactions can be opened at the same time. Transactions are isolated : a transaction see the same state while others transactions are commiting mutations. Transactions can be of two types :
- Concurrent : modifications are overwritten. If multiples transactions are modifying the same Ghosts, then the result state is the one of the last commited transaction.
- Exclusive : two transactions cannot modify the same ghosts. If a Exclusive transaction try to modify a Ghost and this Ghost is already modifyed by an another transaction, a exception is throwed.
Transaction can be commited in synchronous or asynchronous mode. In synchronous mode, opening a new transaction immediatly after a commit permit ro read previous transaction mutations. Otherwise, all is asynchronous. Asynchronous mode is faster and permit to manage a large number of concurrent transactions.
Transactions can be replicated over multiple Repositories in multiple process on multiple machines in a Single Master / Multiple Slave model. A Repository copy synchronized with another Repository is a Replica. To enable replication, a description of the various hosts (end point, process and machines) must be provided to the constructor. Slaves car write and reacts exactly like the master. Master and Slave client code is exactly the same. If a slave Replica is disconnected from the master, all missing transactions are transfered and replayed when the connexion is coming back.
To synchronyse various nodes, a transaction can explicitly lock a Ghost at global scale. If this operation is done on a Slave Replica, it need a Network communication with the Master. Generate a lock is a model design decision, to ensure coherency of critical data. All transactions done by a Slave Replica are visible when they have been sent to the Master, and the Master send back the transaction for replay on all Replicas. Slave transaction validation is based on a Master execution first loop. Any write transaction mutations done by a Slave is sended to his Master. When completelly done on Master, the Slave received his own transaction mutations to apply it locally. This mechanism permit to have a global coherency of the data :
- Only transactions validated on the Master are replicated on Slaves, in the exact same order with consistent transaction identifier.
- Desynchronized Slaves are post-synchronysed when reconnected to the Master. An isolated Slaves from the Master cannot see his transactions result.
SlimLib permit the definition of two type of triggers :
- CommitTriggers : they are called during the transaction commitment.
- SharedTriggers : they are called during the modification of the shared states when replaying a transaction from anothers Replicas or on a local transaction finalization.
CommitTriggers are used to add modifications to the current transaction. SharedTriggers are used to update indexes or initiate linked local operations.
Slimlib is done for asynchronous and paralell processing. Despite you can do anything in synchronous and serialized mode to maintains high state coherency, the best way to distribute processing is to never try to read what have been write.
Real life benchmark
DBMS benchmarks often lie. For remoted high performance DBMS like Redis, benchmarks usually measure network or communication pipes performances more than the engine performances. For in memory systems like FASTER or LMDB, or any key/value in memory store, benchmarks mey measure the Equals(), GetHashCode() or serialization code efficiency. In numerous benchmarks, the external code performances become as important as the DBMS ones. That’s why adding long key and super small single sized values in a DBMS to benchmark overall performance is something like a lie because real world performances will be fare away from this impressive results and depends of the performance of outer code. Bench a key / value system with two super small byte array as key and value do not prove the performance of the final application.
This is where SlimLib is different. Data injected in the SlimLib persistence system are full featured single block of memory based Ghosts : they don’t need any additional processing to be managed by the repository. For this reason, benchmarks are more expressives. SlimLib benchmarks can be real world ones – complexe objects with various fields and relations. Benchmarks are done to be pertinent to figure out the performance you’ll get in real world production applications.
Remember that you do not have to “load” anything : all Ghosts are avaible all time in-memory like native objects in a Dictionary.
Transaction Per Second
With a single thread on a standard 4 cores Intel i7 processor (3.0 Ghz) you can get outperform most of the database engines :
- Create : you can create more than one million Bodies objects with their Ghosts per second.
- Insert / delete : you can reach more than 2 millions instances inserted per second of pre-constructed Ghosts, and do series of insert, update and delete at 3 millions operations per second or more.
- Lookup : you can find more than 7 millions objects in the repository.
- Enumeration : enumeration can be done at 15 million objects per second.
Because of internal sharding, more thread are working more trhoughput can be reached.
SlimLib is the result of a decade of searches and practices done by me, Gabriel RABHI, in field of in house data management systems. My background is video game programming, graphical user interface engine and multimedia application development, full stack proprietary web engine for social networks under .Net (Facebook like).
At this time, SlimLib is not a full featured optimized system with strong abstraction, full tooling and cloud ready distribution. It is more an introduction – a sample – to a set of technics that are underlying a full featured, easy to use, well tooled production ready version to come. It will evolve gradually during the next months with the goal to construct a production ready DBMS.
You can test it, contribute, fund or wait and see.