This week I had the chance to attend the ASE/IEEE International conference on Big Data here in Washington DC. EURA NOVA’s presence has been highly remarked as we had two papers: imGraph (big up to @AldemarReynaga) and a paper presenting our empirical performance analysis on some of the major graph database on the current market.
Big Data is all about applications
Research on Big Data has been around for several years now and at this state many of the researches now focus on actual use cases of Big Data in wide scale scenarios. Indeed, past developments and researches have all paved the way for a path which allows for (1) massive data collecting, (2) storage and (3) processing.
It is now only natural that most researches focus on using these bricks on integral scenarios.
As such I have seen many presentations of applications of Big Data: from the monitoring fishes in the sea around the world (fish4knowledge), classifying them and allowing marine biologists to execute queries on a time frame (e.g. how many gold fishes in south pacific between 1/2001 to 6/2006), to disaster management improvement using mobile data and machine learning. It is now an acknowledged fact that Big Data can ease the work on much scenarios, bring additional knowledge and, even, save lives.
Big Data becomes social
Another big trend that I had already noticed from last year’s edition of SIGKDD is social computing, mainly driven by the proliferation of easily accessible data from social networks such as Twitter, Epinions, … Social computing has gained much momentum and nearly 3 out of 5 presented projects revolved around infering information about human social behaviour (such as e.g. retweet models for users, trust and friendship transfer between members) from public data acquired in social networks.
But hey, where is the Big Data?
The thing is while one might think of Big Data when speaking about data that can be abundant in social networks, the volumes of data generally involved in these projects is ridiculously low. Most of these social computing projects end up working on pre-processed datasets going from tens to hundreds of megabytes and restricted to few users, microscopic volumes with regards to the actual volume of data streaming from those social networks.
From my experience in dealing with the storage and distributed processing of data, I can tell from what I have seen at the conference that there is currently a gap between (1) the actual technological bricks which effectively allow to design resilient applications where the data volumes and the complexity are high and (2) the applications designed on the features enabled by (1). Even though many of the presented applications in social computing have shown interesting results and perspectives, there is still a way to go to make them actually scalable on usable real-time applications.
While some might argue that Big Data is only a passing trend and that many of the focuses of Big Data have already been widely covered in the past years in disciplines such as very large databases, parallel processing, real-time computing and machine learning, I believe that, beyond the overly-hyped buzzword, the intelligent combination of the aforementioned domains in specific applications will actually form the “Big Data” evolution and can ultimately change the game.