Skip to content

New SQL RDBMS Architectures Vs Old ones Vs NoSQL

Those recent years we have seen the NoSQL initiative emerging against the so-called “old, slow and legacy” relational DBs. But today the debate is extending with a newcomer “The New RDMS architecture”. The concept is simple, the RDMS architecture was developed a long time ago, at that time computer science,  computers and processors were extremely different from what we can find today. Why are we not able to gather the recent researches in storage, distributed computing and threading system to re-design a modern RDMS?

That is the case of several open source projects such as INGRESS [1] or VoltDB [2].  I really recommend to spend time to read the excellent post VoltDB Decapitates Six SQL Urban Myths and Delivers Internet Scale OLTP in the Process.

The author positions VoltDB against the NoSQL:  if you can get a relational database with all the scalable ACIDy goodness, why would you ever stoop to using a NoSQL database that might only ever be eventually reliable?  But also by tagging a traditional RDMS as legacy architecture and therefore not scalable not efficient.

Other similar initiatives try to approach the problem in a same way. Hadoop DB [3] uses normal RDMS on each cluster node but exposes a Map/reduce API based on HDFS for generating the query plan on the nodes. The user expresses the SQL statement as usual, but it will be interpreted into HDFS query by the Hive framework [6].

AsterData [4] or Greenplum [5] uses directly a variant of SQL, the SQL map reduce. In this case the SQL statement references coded-procedure which are executed according to MR at run time. As result, the RDMS is both a relational DB and a data application server.

The debate is not NoSQL Vs RDMS anymore, new approaches enforce the relational side but create new debate between RDMS generation. At the end of the day the main question remains: do all address the same issue ? Of course not.  The application architect will have to study all those different variants and define which one suit for the particular needs of the application. However, the story becomes more complex when facing computer science challenges. Do you really think that the storage system for Facebook must be the same as for a BI system?  But what happens if the BI system must deal with the same amount of data that Facebook and in near real time, is is still the same problem?

References

[1] INGRESS Vector wise technology, http://www.ingres.com/vectorwise/

[2] VoltDB, http://voltdb.com/

[3] The HadoopDB project, http://db.cs.yale.edu/hadoopdb/hadoopdb.html

[4] AsterData, http://www.asterdata.com/

[5] Greenplum, http://www.greenplum.com/

[6] The Hive project, http://hadoop.apache.org/hive/


 

Releated Posts

Calibrate to Interpret

Trustworthy machine learning is driving a large number of the ML community works in order to improve ML acceptance and adoption. In this paper, we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation.
Read More

Mass Estimation of Planck Galaxy Clusters using Deep Learning

Galaxy cluster masses can be inferred indirectly using measurements from X-ray band, Sunyaev-Zeldovich (SZ) effect signal or optical observations. Unfortunately, all of them are affected by some bias. Alternatively, we provide an independent estimation of the cluster masses from the Planck PSZ2 catalogue of galaxy clusters using a machine-learning method.
Read More