Skip to content

Data storage elasticity – quick view on master thesis work (part 2)

In this second part I welcome Nicolas Degroodt who explains how he has extended the YCSB for implementing TPC-C benchmark for NoSQL. In this post we call DBMS a storage framework whether it is RDBMS or a NoSQL.

When dealing with new DBMS in a cloud environment, traditional benchmarks (like TPC-C [1]) need to be re-designed. First of all, the semantic of their queries is too rigid and cannot fit, as they are defined, with a NoSQL system. For example, a key-value store cannot execute immediately a GROUP BY SQL statement. These queries need to be translated regarding to the tested storage.
Secondly, their data-schemas are oriented and designed for relational databases. Storing data, as they are described by TPC, in a non-relational databases would be weak in terms of performance. The data-schema needs to be modeled by taking into account the underlying DBMS, its indexation schema, its data sharding policy, etc.

We proposed to adapt the TPC-C to these new DBMS and to study the related issues. We proposed to describe a new benchmark, TPC-C like, which defines data and queries as functional values (instead of semantic fields).
On the other hand, Yahoo Cloud Serving Benchmark (YCSB) proposes a framework [2] and a common set of workloads for evaluating the performance the performance of different key-value and cloud serving stores. Their approach consists of testing DBMS considering only their common basic functions (i.e. put, get, delete operation).

YCSB
The Yahoo! Cloud Serving Benchmark

The modular architecture of YCSB (see figure above from [3]) is very powerful and easy to configure, for instance, scenari are defined by in-line parameters such as number of clients, etc. We adapted this architecture to meet our requirements: (1) we added a pseudo-random data generator module for producing not only data but also parameters for TPC-C transactions, and (2) we added a queries generator module which produces queries for putting data into the data base (according to the data schema) and for attacking the DBMS (according to its syntax).

ycsb2

 

We tested successfully this adapted framework in the EURA NOVA lab on Cassandra 0.5 deployed on a 4-nodes cluster. For testing another DBMS, we need to adapt the Queries Generator and the DB Interface Layer modules. In our next work, we will continue to deal with YCSB and focusing on the elasticity aspects ( definition, criteria and test scenario).

References:

[1] Transaction Processing Performance Council: http://www.tpc.org/
[2] YCSB Application, https://github.com/brianfrankcooper/YCSB
[3] YCSB Article, http://research.yahoo.com/files/ycsb.pdf


 

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.
Read More

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More