So, what was this course about?

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Engaging Business Students in Online Research and Critical Thinking through Customized Assignments Henri Mondschein Information Specialist Manager, Information.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
3-1 Chapter 3 Data and Knowledge Management
DATA WAREHOUSING.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data Vaibhav Nivargi.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
How Can Engineering Take Data Sciences from Ideas to Action
CS345: Advanced Databases Chris Ré. What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge.
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
OnLine Analytical Processing (OLAP)
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Data-Centric Human Computation Jennifer Widom Stanford University.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
1 CSE444: REVIEW. 2 CSE444 in one slide v Logical : E/R diagram  normalized relations v Physical : files, buffering, and indexes v Logical : Relational.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Big Data Yuan Xue CS 292 Special topics on.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data Authored by Sameer Agarwal, et. al. Presented by Atul Sandur.
Book web site:
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
CS 540 Database Management Systems
Sub-fields of computer science. Sub-fields of computer science.
CSCI5570 Large Scale Data Processing Systems
CrowdDb.
Fusion Tables.
Designing a Scalable Data Cleaning Infrastructure
Viewing Data-Driven Success Through a Capability Lens
BlinkDB.
Spark SQL.
Every Good Graph Starts With
BlinkDB.
Chapter 13 The Data Warehouse
CrowdDb.
CrowdDB : Answering queries with Crowdsourcing
Data Warehouse.
Database Testing in Azure Cloud
Current Issues or Challenges in Visual Analytics
Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo
dbTouch: Analytics at your Fingertips
This meme comes from South Park (S2E )
StreamApprox Approximate Stream Analytics in Apache Flink
StreamApprox Approximate Stream Analytics in Apache Spark
StreamApprox Approximate Computing for Stream Analytics
CMPT 733, SPRING 2016 Jiannan Wang
Chapter 9 Database and Information Management.
Logical Data Warehousing and Tableau 10
Data Warehousing and Data Mining
Exploratory search: New name for an old hat?
Overview of big data tools
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
CMPT 733, SPRING 2017 Jiannan Wang
SQL Performance for DBAs
CS 239 – Big Data Systems Fall 2018
Data Wrangling as the key to success with Data Lake
Presentation transcript:

So, what was this course about?

Ingredients Data Analytics Humans Analyzing & extracting value from data Humans As analysts extracting value As workers helping the analysis

Course Objectives Reading and Comprehension Skills You read ~25 papers Critical Thinking and Discussion Skills Active engaging in critically analyzing papers flaws and insights Research Skills Semester-long meaty project Presentation Skills Present the key ideas of a database style paper

Optimization Objectives Accuracy Better, more complete results Power Enabler of more interesting analyses Speed Want results quickly Ease of use For both novice and expert users Cost Crowds, resources

Topics Covered Dealing with Unstructured/Noisy Data Crowd-Powered Data Analytics Data Cleaning Tools Dealing with more data Scalable Data Analytics Approximate Data Analytics Dealing with New Scenarios ML and Graph Processing Collaborative Query Processing Dealing with Novice Analysts Visual Analytics Systems New Interfaces & Usability For each, we covered a A) system or an algorithm + B) connections to other (sometimes old) database topics

Topics Covered New forms of data Crowd-Powered Data Analytics CrowdScreen: Filtering data with humans: cost/latency/accuracy; probabilistic reasoning So Who Won: Max Graph-based maximum-likelihood reasoning Sorts and Joins: Sorting and joins with humans New types of interfaces (hybrid), batching Enumeration: Gathering all entities on a topic Open world assumption, species estimation literature Turkit Programming toolkit for the crowd CrowdDB: DB + Crowds Data model (CNULL), query constructs, query processing Deco: DB + Crowds A more complete language

Topics Covered A. New forms of data 2) Data Cleaning Potter’s Wheel: programmatic cleaning precursor of data cleaning systems. User defined cleaning Wrangler: interactive cleaning autosuggested cleaning alternatives Profiler: cleaning + anomaly discovery appropriate binning for discovering anomalies

Topics Covered B. Dealing with more data 1) Scalable Data Analytics Spark: In-memory query processing noSQL system, datasets as objects, persist (as against MR) Dremel: Google’s parallel column-store system distributed query processing, column stores SparkSQL: DB layer on Spark Translation from SQL to Spark queries, ..

Topics Covered B. Dealing with more data 2) Approximate Analytics: tradeoff between c/l/a BlinkDB: Approximate Query Answering System stratified samples help! Query column sets

Topics Covered C. Dealing with novice analysts 1) Visual Analytics Systems Polaris: Basis for tableau Idea of a data cube, visualizations = cube aggregates! Trust me, I’m partially right: approximate vis online aggregation SeeDB: visualization recommendations scalable grouped query execution techniques

Topics Covered C. Dealing with novice analysts 2) New Interfaces and Usability DBTouch touch-based querying of data: pinch+zoom Gestural Query Specification completeness of operators; user study! Making Database Systems Usable natural language interface types: forms, keyword search, QBE

Topics Covered D. Dealing with new settings 1) Machine Learning and Graph Processing MADSkills: Wrapper on traditional database kinds of ML-based analyses of interest Graphlab: Distributed Graph Analytics tools graph analytics systems, “thinking like a vertex” MLBase: Wrapper on ML algorithms parameter tuning for ML is a pain

Topics Covered D. Dealing with new settings 2) Collaborative Analyses Fusion Tables: Google’s public data analytics / viz tool data cleaning problem; data integration

“Historical” Takeaways Examples: Storage layer: column stores, data compression, data sampling Processing layer: noSQL, adaptive QP, parallel QP c-l-a tradeoff in crowdsourcing, interfaces, batching Usability layer: forms, keyword search, QBE data integration, data cleaning Visualization layer: binning, aggregation, data cubes, online aggregation Applications layer: graph processing machine learning primitives

Mix of Papers: Vision vs. Details Visionary, examples Database usability DBTouch/GestureDB MLBase Detail-Oriented, examples CrowdScreen GraphLab Dremel, SparkSQL

Mix of Papers: Algorithmic vs. Systems Algorithmic: probably 30-35% Systems-oriented: the majority Not surprising given that this is a database systems course ….

(Hopefully) Lessons Learned Don’t solve non-problems! Importance of thinking about users Interface Language careful systems architecture Generalizable Efficient / Powerful Tailored to use-cases Data analytics involves: Usability Careful, Scalable system architecture (Systems) Principled algorithms design (Algorithms)