Authors: Anant Bhardwaj, Amol Deshpande, Aaron J. Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Utility SQL Bin (v3.3). Agenda  Purpose  Target User  Benefits  System Requirement  User Guide Introduction Navigation Add New SQL Add New Version.
Chapter 5 Database Concepts. Why Study Databases? Databases have incredible value to business. Probably the most important technology for supporting operations.
Project 1 Introduction to HTML.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
The Jukebox Orian Paz & Yair Cleper Instructor: Viktor Kulikov Semester: Spring 2009 Final Presentation.
DATA WAREHOUSING.
Nu Project Management Office A web based tool to Manage Projects.
PaperScope: Visually Exploring the ADS Mark Holliman VOTECH Web Developer University of Edinburgh ADASS XVII, London,
Introduction to Building a BI Solution 권오주 OLAPForum
Managing Master Data with MDS and Microsoft Excel
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Microsoft Office SharePoint Server Business Intelligence Tom Rizzo Director, Microsoft Office SharePoint Server
A tour of new features introducing LINQ. Agenda of LINQ Presentation We have features for every step of the way LINQ Fundamentals Anonymous Functions/Lambda.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Creation of hybrid portlet application for file download using IBM Worklight and IBM Rational Application Developer v9 Gaurav Bhattacharjee Lakshmi Priya.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Realizing Business Insights with PowerPivot
What’s New for IT Professionals in Microsoft® SharePoint® Server 2013 (Day 2) Sayed Ali (MCTS, MCITP, MCT, MCSA, MCSE ) Senior SharePoint.
Selected Topics in Software Computing Distributed Software Development CVSQL Final Project Presentation.
Java Beans.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Database System Concepts and Architecture
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
 Chapter 6 Architecture 1. What is Architecture?  Overall Structure of system  First Stage in Design process 2.
6 Chapter Databases and Information Management. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
1 st -4 th December st BioXHIT Annual Meeting WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution Peter.
Technology In Action Chapter 11 1 Databases and… Databases and their uses Database components Types of databases Database management systems Relational.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC BioCoRE: User Experience Markus Dittrich
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Database Concepts Track 3: Managing Information using Database.
Building Dashboards SharePoint and Business Intelligence.
Presented by: Zohreh Fall  Nandish Jayaram  University of Texas at Arlington  Sidharth Goyal  University of Texas at Arlington  Chengkai Li.
Reading Flash. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also choose some.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
2/20: Ch. 6 Data Management What is data? How is it stored? –Traditional management storage techniques; problems –DBMS.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
Download class materials onto your desktop… as usual.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Decibel: The Relational Dataset Branching System
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Excel Services Displays all or parts of interactive Excel worksheets in the browser –Excel “publish” feature with optional parameters defined in worksheet.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
Managing Data Resources File Organization and databases for business information systems.
WEB BASED DSS Aaron Atuhe. KEY CONCEPTS When software vendors propose implementing a Web-Based Decision Support System, they are referring to a computerized.
1 SQL SERVER 2005 Express CE-105 SPRING 2007 Engr. Faisal ur Rehman.
Convergence /6/2018 © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Big-Data Fundamentals
ICT Database Lesson 1 What is a Database?.
Establishing A Data Management Fabric For Grid Modernization At Exelon
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
MANAGING DATA RESOURCES
Server & Tools Business
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
MIS2502: Data Analytics The Information Architecture of an Organization Aaron Zhi Cheng Acknowledgement:
Chapter 3 Database Management
Web Application Development Using PHP
Presentation transcript:

Authors: Anant Bhardwaj, Amol Deshpande, Aaron J. Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang. Type: Demonstration paper Presented by: Dardan Xhymshiti Fall 2015

 Organizations and companies collect data from various sources like:  Financial transactions,  Server logs,  Sensor data etc.  Teams and individuals inside the company want to use these dataset for extracting knowledge from them, using their home-grown tools, company tools, different programming languages, so making modifications on the data set (normalization, cleaning) and then exchanging these dataset back and forth.  Problem: collaborative data analysis.  Heterogeneity of tools, diversity in skill-set of individuals and teams, difficulties on sorting, difficulties on retrieving and versioning of the exchanged datasets.

 The authors motivate they work by providing two examples:  Example 1: Expert analysis:  Members of an web advertising team want to extract knowledge from an unstructured ad- click data. They write a script for extracting the task-relevant information from the data, and store it as a separate dataset which will be shared across the team.  Problems:  Different team members may be more comfortable with a particular tool: R, Python, Awk, and use these tool to clean, normalize and summarize the dataset.  More proficient members use multiple languages for different purposes: Modeling in R. Visualization in JavaScript String extraction in Awl etc.

 The team members manage the data set versions by recording it within a file with name: table_v1, table_v1.1 ….  Versioning is difficult to manage in case of a hundred data set versions.  The final result…:

 The team members manage the data set versions by recording it within a file with name: table_v1, table_v1.1 ….  Versioning is difficult to manage in case of a hundred data set versions.  The final result…:

 Example 2: Novice analysis:  The coach and players of a football team want to study, query and visualize their performance over the last season.  Probably they are going to use a tool like Excel for storing their data set, which have limited support on querying, cleaning, analysis or versioning.  Query example: The coach wants to find all the games where a star player was absent?  Most of the team players are not proficient with data analysis tools, such as SQL or scripting languages.  Solution of the problem: Point-and-click apps. These apps offer:  Easy load, query, visualize and share results with other users without much effort.

 These teams are unable to perform collaborative data analysis because of the lack of: 1. Flexible data sharing and versioning support 2. Point-and-click apps to help novice users do collaborative data analysis 3. Support for a number of data analysis languages and tools.  A tool for collaborative analysis can be used for example by genetics who want to share and collaborate on genome data with other research groups.

 To address these problems the paper presents DataHub a unified data management and collaboration platform for hosting, sharing, combining and collaboratively analyzing datasets.  DataHub has three main components: 1. Flexible data storage, sharing, and versioning capabilities. a) Keeps track of all versions of dataset. b) Enables collaborative analysis, while at the same time allows storing and retrieving these datasets at various stages of analysis. 2. App ecosystem for easy querying, cleaning, and visualization. a) Distill: data cleaning by example tool. b) DataQ: a query builder tool that allows user to build SQL queries by direct manipulation in graphical user interface. Interface is suitable for non-technical users. c) Dviz: Data visualization tool.

3. Language-agnostic hooks for external data analysis. For the team members that are proficient on different languages and libraries like: Python, R, Scala and Octave, the DataHub enable collaborative data analysis by using Apache Thrift to translate between these languages and datasets in DataHub.