Download presentation
Presentation is loading. Please wait.
Published byMarianna Phelps Modified over 9 years ago
1
‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK varghese@st-andrews.ac.uk http://www.blessonv.com varghese@st-andrews.ac.uk http://www.blessonv.com
2
Agenda Introduction Challenges Framework Demo Feasibility Study Conclusions B. Varghese - Big Humanities 20142
3
Introduction Transformative change in the data analysis landscape o Traditionally, used spreadsheet like applications o Now, big data tools Big data technologies are maturing o Cloud computing – infrastructure support o Hadoop and Hive – programming paradigm Technologies are sometimes not easy for even computer scientists o Set up, programming, adapting to hardware infrastructure, etc B. Varghese - Big Humanities 20143
4
Challenges Limited Accessibility of Big Data Tools o Gap between technology and end user o In-depth knowledge of the tools required to use it o Knowledge of hardware and excellent programming skills required Lack of Exploratory Tools for Big Data o Perform quick analysis without undertaking large programming tasks Lack of Lightweight Big Data Tools o Full fledged and comprehensive tools are available but require professional training B. Varghese - Big Humanities 20144
5
BigExcel Framework Three tier framework: o User Interaction Layer Data browser built using RichFaces Connects to next layer using RESTful Web Services o Query Management Layer Constructs queries for Hive Manages the data Stores the logic for analytical operations in MapReduce o Infrastructure Management Layer Connecting to the Cloud Amazon Web Services SDK used B. Varghese - Big Humanities 20145
6
BigExcel v1.0 Demo B. Varghese - Big Humanities 20146
7
Feasibility Study Based on Yahoo Sandbox datasets o Predicting market trends o News related n-grams Example o User clicks on the browser o Clicks are converted to queries: SELECT TRANSFORM(date, time, buzz_score) USING ’hourly_analysis’ FROM Yahoo_Buzz_Scores WHERE product=’EBOOKS’ AND date >= 2005-05-23 AND date <=2005-05-27; o Generate output like: B. Varghese - Big Humanities 20147
8
Conclusions Challenges remain in making big data tools accessible to the wider community This research is our first step towards addressing the challenges Happy to chat with anyone (non-CS and CS) about potential avenues that need to be explored o Specific needs of communities like Digital Humanities B. Varghese - Big Humanities 20148 Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.