‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Technology of Data Analytics. INTRODUCTION OBJECTIVE  Data Analytics mindset – shallow and wide, deep when you need it  Quick overview, useful tidbits,
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
2 Learning Environments 2 Learning Environments Learning Context and School Culture Learning Content Technology Access Information and Communication.
XRX: The Metadata Registry Example 21 October 2008 Jeremy Sutton Matt Steele.
Syr Johnathan Duncan. GIS What is GIS? Geography is information about the earth's surface and the objects found on it, as well as a framework for organizing.
Open an internet browser such as internet explorer.
Learning by Doing: Cases of Librarians Working with Faculty Research Data for the First Time IASSIST 2010 Jake CarlsonMichael Witt Data Research Interdisciplinary.
Effective Math Online LearningLDT 2001EMOL EMO L Effective Mathematics Online Learning.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Property of Cracking Siebel MS Excel Tool Column-To-Query.
Our Commitment to Your Success: Global eTraining.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
BHUTAN’S EXPERIENCE USE OF TECHNOLOGICAL TOOLS IN THE DISSEMINATION OF CENSUS DATA TASHI DORJEE NATIONAL STATISTICS BUREAU.
FOSS4G: 52°North WPS Behind the buzz of Cloud Computing - 52°North Open Source Geoprocessing Software in the Clouds FOSS4G 2009.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Microsoft Azure Introduction ISYS 512. Microsoft Azure Microsoft Azure is a cloud.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Introduction to Hadoop and HDFS
Looking ahead in Pervasive Computing: Challenges, Opportunities in the era of Cyber Physical Convergence Authors : Marco Conti, Sajal K. Das, Chatschik.
An Introduction to HDInsight June 27 th,
How to manage counselling services in pre entry phase ? IAEVG Jyväskylä 2009.
The City of Los Angeles Implements Its New Web Presented By: SARAH EL-SHOUBARY By: Madeleine Rackley Web Address: Topic.
Learning Objective The students should be able to: a. state the definition of software b. state the usage of software c. list different types of software.
UNIZULU INSTITUTIONAL REPOSITORY GATEWAY TO LOCAL CONTENT.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Machine Learning as a Service
Review of technologies for developing geospatial applications with a focus on open source (FOSS4G) and their implementation of cloud computing application.
Copyright © 2015, SAS Institute Inc. All rights reserved. Future Drug Applications with No Tables, Listings and Graphs? PhUSE Annual Conference 2015, Vienna.
Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.
Paperless Timesheet Management Project Anant Pednekar.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
A Framework for Integrating Technology Mark Grabe.
By Heather, Cindy, Lisa and Allison. 4vPO4DYY The “cloud” refers to surplus computing resources that are available.
The Multilingual Web – Where Are We? Next Generation Localisation Josef van Genabith, CNGL & NCLT, DCU.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
3 DAYS ON JANUARY 16 th, 17 th & 18 th 2015 Santa Clara Convention Center, 5001 Great America Parkway, Santa Clara, CA 95054, United States.
Beyond Hadoop The leading open source system for processing big data continues to evolve, but new approaches with added features are on the rise. Ibrahim.
Cloud Computing ENG. YOUSSEF ABDELHAKIM. Agenda :  The definitions of Cloud Computing.  Examples of Cloud Computing.  Which companies are using Cloud.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Kathleen Shearer Data management: The new frontier for libraries.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
“Scorecards and PokemonGO”
Big Data & Test Automation
SAM Baseline Review Engagement
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Skills and tools required in the supply chain to take advantage of the digital era. Atul Padalkar 29th July 2017.
SAS users meeting in Halifax
Trends in Emerging Technologies
CSPA & Digital Transformation
Hadoop Developer.
Big-Data Fundamentals
Get Real Value and Insights from Your Data: Biin Solutions Provides Predictive Analytics, IoT, and Business Intelligence with Microsoft Azure Power MICROSOFT.
Kbv Research | +1 (646) | Executive Summary (1/2) Global Infrastructure as a Service Market Knowledge Based Value (KBV)
September 11, Ian R Brooks Ph.D.
Server & Tools Business
OMIS 665, Big Data Analytics
XtremeData on the Microsoft Azure Cloud Platform:
In BI, One Size Does Not Fit All
Big Data Young Lee BUS 550.
Visual Studio 2005 Tools For Office: Creating A Multi-tier Application
The Most In-Demand Skills for Cloud Computing.
Big DATA.
Web archives as a research subject
Leon Kos University of Ljubljana
HDInsight & Power BI By Łukasz Gołębiewski.
Server & Tools Business
Presentation transcript:

‘BigExcel’ A Web-Based Framework for Exploring Big Data in Social Sciences Asif Saleem, Blesson Varghese and Adam Barker University of St Andrews, UK

Agenda Introduction Challenges Framework Demo Feasibility Study Conclusions B. Varghese - Big Humanities 20142

Introduction Transformative change in the data analysis landscape o Traditionally, used spreadsheet like applications o Now, big data tools Big data technologies are maturing o Cloud computing – infrastructure support o Hadoop and Hive – programming paradigm Technologies are sometimes not easy for even computer scientists o Set up, programming, adapting to hardware infrastructure, etc B. Varghese - Big Humanities 20143

Challenges Limited Accessibility of Big Data Tools o Gap between technology and end user o In-depth knowledge of the tools required to use it o Knowledge of hardware and excellent programming skills required Lack of Exploratory Tools for Big Data o Perform quick analysis without undertaking large programming tasks Lack of Lightweight Big Data Tools o Full fledged and comprehensive tools are available but require professional training B. Varghese - Big Humanities 20144

BigExcel Framework Three tier framework: o User Interaction Layer Data browser built using RichFaces Connects to next layer using RESTful Web Services o Query Management Layer Constructs queries for Hive Manages the data Stores the logic for analytical operations in MapReduce o Infrastructure Management Layer Connecting to the Cloud Amazon Web Services SDK used B. Varghese - Big Humanities 20145

BigExcel v1.0 Demo B. Varghese - Big Humanities 20146

Feasibility Study Based on Yahoo Sandbox datasets o Predicting market trends o News related n-grams Example o User clicks on the browser o Clicks are converted to queries: SELECT TRANSFORM(date, time, buzz_score) USING ’hourly_analysis’ FROM Yahoo_Buzz_Scores WHERE product=’EBOOKS’ AND date >= AND date <= ; o Generate output like: B. Varghese - Big Humanities 20147

Conclusions Challenges remain in making big data tools accessible to the wider community This research is our first step towards addressing the challenges Happy to chat with anyone (non-CS and CS) about potential avenues that need to be explored o Specific needs of communities like Digital Humanities B. Varghese - Big Humanities Thank you for your attention!