Last week I was at Tapei with Nam-Luc for presenting the AROM paper. I wanted to come back on the trends of this year at the conference which, by the way, are a really good insight of the hot topics in cloud, distributed computing and HPC. I will not dive into details for each of them, if you have any question just post a comment or send me a mail!
Data Mining on cloud
There was an important part of the keynotes and talks about using the cloud and its computing power for data mining. Even if from a high level view it could make sense I still think that’s a bit bizarre for me to use cloud for this kind of computation, let me argument my view:
(1) the bandwidth is just awful and we have no control on it, that is an important issue when dealing with important volume of data,
(2) elasticity is not good argument, few of the processing layer can really be elastic, even Hadoop must have a pre-defined set of worker node before starting a job,
(3) multi-tenant: the main purpose of cloud architecture is to sever the maximum of users by minimizing their resource usage, that far away from the data mining objectives that are much more similar as the grid ones,
(4) performance: because of the virtualization of network and resource, the cloud application are not really high performance oriented,
(5) industry trends: if we look at the trends in the industry in data mining, especially in the data warehouse and enterprise world, the direction is completely the opposite. The major player focus on high-end appliance with in-memory computing deeply integrated with the underlying platform. In addition, almost all of them propose a cluster of few machines, hardly often more than 10, connected by INFINIBAND.
Using the cloud as a platform for applications
A lot of talks were speaking about using the cloud for vertical markets as e-health, food tracking, internet of things, etc. That is a trend we can see from 1-2 years, a lot of persons think we could use the cloud as a new generation of Service platform. That is interesting because in the Telco world, the concept of SDP (Service Delivery Platform) is well known, however, nobody points out the lack in the cloud to be functionally equivalent to an SDP. As a result we present the cloud as a service delivery environment but each application requires re-building a complete ad-hoc stack. This is an issue that architects and developers will have more and more to face.
Security and accountability
Security is definitively a hot topic. That makes sense since this is one of the top most show stoppers for entering in the cloud. Going further, a really interesting key note from HP (leading the EU Research project A4Cloud ) described the concept of accountability of service provider but also from the complete chain of services which composed the main one. Actually, when we think about that, we can quickly realize that the cloud brings the typical issue of outsourcing operations, but in addition it brings: a longer chain of trust, a limited auditability, a new set of liability and legal issue and new set of technical attacks (typically hypervisor oriented). So, the security is definitively a source of research.
A lot of people explained the need for federating cloud mainly for vendor lock-in issues but also for reliability and even for security (for sharding encrypted data through different providers). Federation opens the landscape of a new class of technical issues. Going further, I see the hybrid cloud as a particular case of federation. Indeed, I do not belong to this class of persons who think that hybrid cloud will be asked and applied by every application, mainly because most of the enterprise applications are data-driven, and then consuming a burst on a cloud would require to migrate and synchronize the data between the private and the public cloud. That would require a lot of complexity, however smart architecture design exist as messaging and asynch. distributed cache, but it would require a complete re-design of the applications. Instead, I see more the hybrid cloud as a federation of a private and a public cloud. In this case, some services are running in one of the cloud and the applications run on the other. That is a much more realistic scenario. This is typical case of federation.
We have been speaking about Cloud standardization for a while and too much standards and standardization bodies are fighting each other. However, it seems that some initiatives aim at federating the existing standards. The most promising one is the Global Inter-cloud Technology Forum (GICTF).
GPU for HPC
That is interesting to see how High Performance Computing (HPC) researchers have moved from grid to GPU and how they decrease dramatically the cost of their infrastructures while keeping the same performance on GPU. Today, we can not only see computation simulations, but also data mining algorithms and statistic packages. However, GPU still needs to deal with RAM and big data problems go beyond the RAM capacity. That is a really exciting research area.
Everybody speak about big data, but few speak about collecting them. In the off-line discussions I had with researchers, mainly working in IT field, they mentioned the significant problems they had for collecting and aggregating events.
As you can see, there are really interesting and important topics for the research community for the next year!
 A4cloud CORDIS page, http://cordis.europa.eu/search/index.cfm?fuseaction=proj.document&PJ_RCN=134045
 GICTF, http://www.gictf.jp/index_e.html