Skip to content

Data storage elasticity – quick view on master thesis work (part 2)

In this second part I welcome Nicolas Degroodt who explains how he has extended the YCSB for implementing TPC-C benchmark for NoSQL. In this post we call DBMS a storage framework whether it is RDBMS or a NoSQL.

When dealing with new DBMS in a cloud environment, traditional benchmarks (like TPC-C [1]) need to be re-designed. First of all, the semantic of their queries is too rigid and cannot fit, as they are defined, with a NoSQL system. For example, a key-value store cannot execute immediately a GROUP BY SQL statement. These queries need to be translated regarding to the tested storage.
Secondly, their data-schemas are oriented and designed for relational databases. Storing data, as they are described by TPC, in a non-relational databases would be weak in terms of performance. The data-schema needs to be modeled by taking into account the underlying DBMS, its indexation schema, its data sharding policy, etc.

We proposed to adapt the TPC-C to these new DBMS and to study the related issues. We proposed to describe a new benchmark, TPC-C like, which defines data and queries as functional values (instead of semantic fields).
On the other hand, Yahoo Cloud Serving Benchmark (YCSB) proposes a framework [2] and a common set of workloads for evaluating the performance the performance of different key-value and cloud serving stores. Their approach consists of testing DBMS considering only their common basic functions (i.e. put, get, delete operation).

YCSB
The Yahoo! Cloud Serving Benchmark

The modular architecture of YCSB (see figure above from [3]) is very powerful and easy to configure, for instance, scenari are defined by in-line parameters such as number of clients, etc. We adapted this architecture to meet our requirements: (1) we added a pseudo-random data generator module for producing not only data but also parameters for TPC-C transactions, and (2) we added a queries generator module which produces queries for putting data into the data base (according to the data schema) and for attacking the DBMS (according to its syntax).

ycsb2

 

We tested successfully this adapted framework in the EURA NOVA lab on Cassandra 0.5 deployed on a 4-nodes cluster. For testing another DBMS, we need to adapt the Queries Generator and the DB Interface Layer modules. In our next work, we will continue to deal with YCSB and focusing on the elasticity aspects ( definition, criteria and test scenario).

References:

[1] Transaction Processing Performance Council: http://www.tpc.org/
[2] YCSB Application, https://github.com/brianfrankcooper/YCSB
[3] YCSB Article, http://research.yahoo.com/files/ycsb.pdf


 

Releated Posts

IEEE Big Data 2023 – A Summary

Our CTO, Sabri Skhiri, recently travelled to Sorrento for IEEE Big Data 2023. In this article, Sabri explores for you the various keynotes and talks that took place during the
Read More

Robust ML Approach for Screening MET Drug Candidates in Combination with Immune Checkpoint Inhibitors

Present study highlights the significance of dataset size in ICI microbiota models and presents a methodology to enhance the performances of a multi-cohort-based ML approach.
Read More