This month EURA NOVA has presented the paper “A Distributed Data Mining Framework Accelerated with Graphics Processing Units” at the 2013 CloudCom-ASIA conference in Fuzhou, China. This is the third time that EURA NOVA has published a paper in this conference which is now considered one of the three most important IEEE event covering the topic of Cloud Computing.
What is interesting to notice year after year when attending a same conference is the actual way scientific contributions get more and more refined around the same topics. The “cloud” is not only a buzzword anymore but, when used, covers a series of underlying inherent concepts. These concepts now include, on-demand resources, elasticity, abolition of central control, dynamic resources and attribution of resources based on a market. This has been noticed as the scientific contributions are now more precise and well defined.
THE CLOUD OF ENERGY
As technologies around the cloud reach a mature state, many discussions are taking place to get inspired by the mechanisms used in cloud computing and apply them in the energy sector. In the same way that resources from different providers can interact and compete on a same resource market, as in the federation of public and private clouds, the idea is to enable independent energy producers to provide their energy on the same global “smart grid”. This becomes a necessity as datacenters require more energy the more they grow. Some other ideas include a routing mechanism of the energy with “electricity routers”, the same way information is routed through load balanced servers on the Internet. The government of China has invested quite effort these past years into this type of innovation. Even though the concept is promising, some questions remain however unsolved at the moment, for example concerning the security of the transferred electricity, as the unsecured bits conveyed on the Internet.
NEW HARDWARE ARCHITECTURE FOR THE CLOUD
It is now an acknowledged fact that Big Data exists and can be processed. But the big question remains “How to generate value from Big Data?”. The answer to that question still requires improvements in the way the data is processed nowadays in order to increase the performance of this processing. In order to achieve this, new hardware architectures have to be defined in addition to new processing models. The hardware stack that will handle Big Data processing needs to be fully redesigned and take into account manycore processors, GPUs, Software-Defined Networks, and store data on SSD. This will also require DSLs that will allow users to define coherent jobs despite the heterogeneous hardware architecture. This is actually what is already proposed in our extension of the DFG processing model to take into account nodes with GPU processing.
DISCUSSION ABOUT DISTRIBUTED DATA MINING FOR BIG DATA
Many researches in data mining have now taken the path of distributing the process of the algorithm. As such, the framework proposed by EURA NOVA also produces partial models by workers which are merged back to form the final model. For iterative algorithms, in some case, distributing the processing will actually require make the convergence slower. This is however acceptable in the context of Big Data processing as long as the performance gain on an iteration still outperforms the iteration time on one node [REFERENCE: Chengjie Qin & Florin Rusu, Speeding up Distributed Low-rank Matrix Factorisation].
CLOSING THOUGHT
As years pass by, like a good wine gets better with age, research on Cloud Computing has kept on proposing new ambitious architectures and concepts. From my point of view what we are actually doing in the field is still paving the way for the applications of tomorrow. Cloud Computing is a movement that is set, gains balance and soundness, and is sure to still be around for the next decade.
Nam-Luc Tran