Skip to content

FOSDEM 2010: The Raise of the NoSQL initiative

What’s NoSQL?

Even if the name is really meaningless, the NoSQL defines a new generation of Key/value pair storage. This initiative is gaining popularity but also maturity. The FOSDEM dedicated a complete day and dev. room for this subject. The wikipedia definition defines this movement as: “NoSQL is an umbrella term for a loosely defined class of non-relational data stores that break with a long history of relational databases and ACID guarantees. Data stores that fall under this term may not require fixed table schemas, and usually avoid join operations. The term was first popularised in early 2009. Trends in computer architectures are pressing databases in a direction that requires horizontal scalability. NoSQL-style data stores attempt to address this requirement. Prominent closed-source examples are Google‘s BigTable and Amazon‘s Dynamo. Several open-source variants exist including Facebook‘s Cassandra, Apache HBase, LinkedIn‘s Project Voldemort and many others.”

Check up before swapping out your relational DB

Those projects look very promising and really efficient, however, if you plan to include such technologies, you must be sure it fits your needs. Therefore, a first analysis on specific attention points must be realized:

  • Consistency model analysis: most of these projects implement an Eventually consistent model [1] which requires a repair mechanism. Does it fit to your concurrency policy ? For instance with Multiple Version Concurrency Control (MVCC) with optimistic locking, such as JBoss Mobicents JSLEE does, this consistency model is not directly compatible. Who must provide the repair mechanism, the storage engine or clients? Do you have the flexibility to choose with the storage you want to use ? In addition, there are a number of practical improvements to the eventual consistency model, such as session-level consistency and monotonic reads, which provide better guarantee to the client application
  • Scalability model analysis: you must analyze how those storage systems scales in term of indexing, sharding distribution configuration and access, request routing system, optimization, etc. For instance, MongoDB [2] does not use the consistent hashing algorithm for routing requests and finding shardings. Instead it uses an internal Router and location tables. The advantage is the flexibility that let you use any data you want as index. The point is that it uses an optimization for storing information in the in-memory store. If you haven’t got much data, it stores the data directly in the in-memory store, if you have more important data, it only stores indexes, and if you have a huge data volume it only keeps the portion of the indexes you need. That makes mongoDB very useful as web app back end, but not efficient for applications which store million of data and access them randomly.
  • Transactional model: how are the transactions managed, is there a safe mode which does not affect performance?
  • Client API and Query model: for instance HBase [3] provides a really simple interface without a real query model while mongoDB can embed JQuery[4].

 

The usage of NoSQL storage is really promising, but we should carefully analyze the impacts of their architectures on your applications before swapping out your RDMS.

References:

[1] Werner Vogels, Eventually consistent, December 2008, http://www.allthingsdistributed.com/2008/12/eventually_consistent.html

[2] The MongoDB project, http://www.mongodb.org

[3] The HBase project, http://hadoop.apache.org/hbase/

[2] The JQuery project, http://jquery.com/


 

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.
Read More

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More