What’s NoSQL?
Even if the name is really meaningless, the NoSQL defines a new generation of Key/value pair storage. This initiative is gaining popularity but also maturity. The FOSDEM dedicated a complete day and dev. room for this subject. The wikipedia definition defines this movement as: “NoSQL is an umbrella term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas, and usually avoid join operations. The term was first popularised in early 2009. Trends in computer architectures are pressing databases in a direction that requires horizontal scalability. NoSQL-style data stores attempt to address this requirement. Prominent closed-source examples are Google‘s BigTable and Amazon‘s Dynamo. Several open-source variants exist including Facebook‘s Cassandra, Apache HBase, LinkedIn‘s Project Voldemort and many others.”
Check up before swapping out your relational DB
Those projects look very promising and really efficient, however, if you plan to include such technologies, you must be sure it fits your needs. Therefore, a first analysis on specific attention points must be realized:
- Consistency model analysis: most of these projects implement an Eventually consistent model [1] which requires a repair mechanism. Does it fit to your concurrency policy ? For instance with Multiple Version Concurrency Control (MVCC) with optimistic locking, such as JBoss Mobicents JSLEE does, this consistency model is not directly compatible. Who must provide the repair mechanism, the storage engine or clients? Do you have the flexibility to choose with the storage you want to use ? In addition, there are a number of practical improvements to the eventual consistency model, such as session-level consistency and monotonic reads, which provide better guarantee to the client application
- Scalability model analysis: you must analyze how those storage systems scales in term of indexing, sharding distribution configuration and access, request routing system, optimization, etc. For instance, MongoDB [2] does not use the consistent hashing algorithm for routing requests and finding shardings. Instead it uses an internal Router and location tables. The advantage is the flexibility that let you use any data you want as index. The point is that it uses an optimization for storing information in the in-memory store. If you haven’t got much data, it stores the data directly in the in-memory store, if you have more important data, it only stores indexes, and if you have a huge data volume it only keeps the portion of the indexes you need. That makes mongoDB very useful as web app back end, but not efficient for applications which store million of data and access them randomly.
- Transactional model: how are the transactions managed, is there a safe mode which does not affect performance?
- Client API and Query model: for instance HBase [3] provides a really simple interface without a real query model while mongoDB can embed JQuery[4].
The usage of NoSQL storage is really promising, but we should carefully analyze the impacts of their architectures on your applications before swapping out your RDMS.
References:
[1] Werner Vogels, Eventually consistent, December 2008, http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
[2] The MongoDB project, http://www.mongodb.org
[3] The HBase project, http://hadoop.apache.org/hbase/
[2] The JQuery project, http://jquery.com/