This week I attended the 3rd IEEE International conference on cloud computing Technology and science for presenting our paper about the Elastic Queuing Service. This offers me a new opportunity to give you an overview of hot topics and trends from this conference. As usual I will only give you an overview of the talks I had the chance to attend, since there was a lot of parallel tracks.
MapReduce in the cloud
This is clearly a hot topic in the community, there was a special MapReduce track and even a lot of papers distributed among the other tracks and workshops. Those papers were mainly focused on the MapReduce improvements:
- Better fault tolerance and high availability
- New schedulers, especially trying to optimize the local-data task assignation
- Attempts to make MR asynchronous and a little less batch than it is today
I have to notice a really interesting paper about the convergence between MR and the streaming programming [1]. The idea is to extend the reduce phase to a sliding window of reduce, in which the reducers can still receive inputs. As a result you can integrate the MR process into a real stream approach. I found it really cool, because it can be used (1) in new generation of Data warehouse in which you push data in the data processing stream instead of extracting it like it is done today and (2) you can even think about implementing a kind of MR-based CEP by letting MR and a streaming platform implementing your event storage and even correlation software.
We had even a complete tutorial session by a Yahoo! architect about Hadoop HDFS, PigLatin [2] and Oozie [3]. For those of you who think that Pig Latin is an Italian Pig and Oozie a former rock star doing Reality shows, you have to know that Pig is the procedural data processing language developed on top of MR. The idea is that you should be able to express data flow processing by expressing the requests as a set of SQL like statements. Behind the scene, Pig generate a logical and a physical execution plan and finally the set of MR jobs that must be scheduled.
Oozie is a workflow scheduler for Hadoop. The idea is that you can express a real workflow of operations (a directed acyclic graph of actions) as MR operations, but also Pig operations. It is a way to have a kind of orchestration of MR jobs. A nice example of utilization is described by a cloudEra blog post [4].
I have to say that I was a little bit disappointed on that track. Every talk started by “MR is increasingly popular, then we have to work with …” but nobody spoke about the limitation of the paradigm and complexity to deal with something else than Map and reduce.
Cloud Architecture
There was a lot of papers about new architectures in Cloud for interoperability, PaaS container scalability, privacy and security management. The interoperability is clearly a hot topic, it has been addressed as resource management description languages, but also in term of contextualization [5]. The idea behind the concept is to be able to model the context in which the service (and its VMs) is executed. This could include, the IP addresses of the nodes composing the service, the admin domain in which you are running and all the associated policies, the type of VM you are using, the VPN topology configuration, the data location & data management, etc. Therefore, the contextualization runtime should be able to configure your service through the extensions points, for making your service compliant with its context.
It is worth noting that there was a few papers describing optimized architectures in term of resource energy management and SLA management. Although the SLA management is still a great challenge, there was not a lot about. The majority of those papers came from former researchers of the reference project in this area SLA@SOI [6]. The basic idea of the SLA is that you define your SLA template (Business SLA, service SLA, infrastructure SLA) that you can negotiate with the infrastructure operator. The negotiation is made by analysing the available resources comparing the price and SLA requested. Once the negotiation phase in finished, the cloud operator must define the monitor probes requested to monitor the SLA and defining the enforcement SLA logic.
However, in a private cloud, you are limited in term of resources, then how to deal with negotiated SLA? There was an interesting research [7] about merging the results of SLA@SOI and the advanced resource reservation [8]. I strongly recommend you to take a look.
That is already a lot for this post, so I will let my considerations about cloud standardisation challenges in the next post.
To be continued …
References
[1] Andrey Brito and al., Scalable and Low-Latency Data Processing with Stream MapReduce. In Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2011. IEEE Press, November 2011.
[2] Hadoop PigLatin project, http://pig.apache.org/
[3] Apache Oozie project, http://incubator.apache.org/oozie/
[4] Tim Robertson, Lars Francke and Oliver Meyn. Biodiversity Indexing: Migration from MySQL to Hadoop, June 2011, http://www.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
[5] Django Armstrong and al., Towards a Contextualization Solution for Cloud Platform Services. In Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2011. IEEE Press, November 2011.
[6] SLA@SOI FP7 Project, http://sla-at-soi.eu/
[7] Kuan Lu and al., QoS-aware SLA-based Advanced Reservation of Infrastructure as a Service. In Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, IEEE CloudCom 2011. IEEE Press, November 2011.
[8] B.Sotomayor, R.Santiago Montero, I.Martín Llorente, I.Foster, Virtual Infrastructure Management in Private and Hybrid Clouds. IEEE Internet Computing, vol. 13, no. 5, pp. 14-22, Sep./Oct. 2009