‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK
Agenda Introduction Challenges Framework Demo Feasibility Study Conclusions B. Varghese - Big Humanities 20142
Introduction Transformative change in the data analysis landscape o Traditionally, used spreadsheet like applications o Now, big data tools Big data technologies are maturing o Cloud computing – infrastructure support o Hadoop and Hive – programming paradigm Technologies are sometimes not easy for even computer scientists o Set up, programming, adapting to hardware infrastructure, etc B. Varghese - Big Humanities 20143
Challenges Limited Accessibility of Big Data Tools o Gap between technology and end user o In-depth knowledge of the tools required to use it o Knowledge of hardware and excellent programming skills required Lack of Exploratory Tools for Big Data o Perform quick analysis without undertaking large programming tasks Lack of Lightweight Big Data Tools o Full fledged and comprehensive tools are available but require professional training B. Varghese - Big Humanities 20144
BigExcel Framework Three tier framework: o User Interaction Layer Data browser built using RichFaces Connects to next layer using RESTful Web Services o Query Management Layer Constructs queries for Hive Manages the data Stores the logic for analytical operations in MapReduce o Infrastructure Management Layer Connecting to the Cloud Amazon Web Services SDK used B. Varghese - Big Humanities 20145
BigExcel v1.0 Demo B. Varghese - Big Humanities 20146
Feasibility Study Based on Yahoo Sandbox datasets o Predicting market trends o News related n-grams Example o User clicks on the browser o Clicks are converted to queries: SELECT TRANSFORM(date, time, buzz_score) USING ’hourly_analysis’ FROM Yahoo_Buzz_Scores WHERE product=’EBOOKS’ AND date >= AND date <= ; o Generate output like: B. Varghese - Big Humanities 20147
Conclusions Challenges remain in making big data tools accessible to the wider community This research is our first step towards addressing the challenges Happy to chat with anyone (non-CS and CS) about potential avenues that need to be explored o Specific needs of communities like Digital Humanities B. Varghese - Big Humanities Thank you for your attention!