CS 784: Advanced Topics in Data Management This semester’s focus: Data Science AnHai Doan.

Slides:



Advertisements
Similar presentations
Desire2Learn Advanced Learning Analytics Ronald Mol Desire2Learn
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Chapter 14 The Second Component: The Database.
Data Analytics Program at Drake Brad C. Meyer, Chair Information Management and Business Analytics.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Data Mining – Intro.
CSCE 211: Digital Logic Design
Business Intelligence Technology and Career Options Paul Boal Director - Data Management Mercy ( April 7, 2014.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Lecture-8/ T. Nouf Almujally
AnHai Doan University of Wisconsin Big Data, Big Knowledge, and Big Crowd.
Basic Marketing Research Customer Insights and Managerial Action
Chapter 16 Building the Data Mining Environment. 2 The Ideal Customer-Centric Organization Customer is king (not pauper) For B2C (business to consumer)
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada.
Highline Class, BI 348 Basic Business Analytics using Excel, Chapter 01 Intro to Business Analytics BI 348, Chapter 01.
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Trends in IT Research: Experiences from Katrinebjerg and BRICS Ivan Damgård, Århus University.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Database Principles. Basics A database is a collection of data, along with the relationships between the data The data has to be entered into a structure,
Understanding the field & setting expectations.  Personal  International  UNT Alumni (Mathematics)  Academic  Economics & Mathematics  Professional.
CISB113 Fundamentals of Information Systems Data Management.
Keuze semester big data
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Chapter 9 Management Issues in System Development By Chris Forrest 4/1/2002.
Business Analytics Skills
CSCI 6442 Database Management II INTRODUCTION Copyright 2016 David C. Roberts, all rights reserved.
OpenI (“open-eye”) : Open Source Business Intelligence Gets Real Sandeep Giri Project Lead, openi.org CTO, Loyalty Matrix, Inc. MySQL User Conference 2006.
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Chapter 1 DECISION SUPPORT SYSTEMS AND BUSINESS INTELLIGENCE Skip subsections: 1.1, 1.2, 1.8, 1.10.
Big Data Yuan Xue CS 292 Special topics on.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
What is Data Science and Who is Data Scientist
Computing & Information Sciences Kansas State University An Overview of Big Data Analytics: Challenges & Selected Applications Guest Seminar Drake University.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Analytics (CS40003) Introduction to Data Lecture #1
Reinventing Customer Experiences
Database Principles.
Careers in data science – opportunities & challenges
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Cortana Intelligence Overview
Power of Social Media Analytics
Big Data Analytics in Parallel Systems
Information Systems in Organizations 1.1 Introduction to MIS
Information Systems in Organizations 1.1 Introduction to MIS
Information Systems in Organizations 1.1 Introduction to MIS
Information Systems in Organizations 1.1 Introduction to MIS
Data Warehousing and Data Mining
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
Information Systems in Organizations 1.1 Introduction to MIS
FashionBrain: Understanding Europe’s Fashion Data Universe
Big DATA.
Welcome! Knowledge Discovery and Data Mining
Information Systems in Organizations 1.1 Introduction to MIS
Cloud Futures Panel -- Future cloud-related research
Presentation transcript:

CS 784: Advanced Topics in Data Management This semester’s focus: Data Science AnHai Doan

What We Will Discuss Logistic –course enrollment –no class this Friday What is data science? Motivation, the rise of data science What CS at UW-Madison is doing about it What will be covered in this class, goals of the class Course syllabus Next step 2

Data Science No one really knows what it is There is a popular joke about this A very common definition –data science focuses on extracting (actionable) insights/knowledge from data This does not really capture all DS activities “in the wild” 3

Data Science Tasks –extract insights from data = performing analysis –build data-driven artifacts: knowledge bases, rec systems, … –design data-driven experiments to answer a question Need to know –database management (RDBMSs), machine learning, AI, data mining –managing different kinds of data (relational, text, Web, graph, time series, etc) –statistics –optimization, linear algebra –visualization –big data systems –distributed/parallel systems, networking –security/privacy Skills –Python/R data science eco systems –Big data systems: Hadoop, Spark, NoSQL –SQL 4

How is DS Different From … RDBMSs data mining statistics Big Data 5

Motivation / The Rise of Data Science RDBMSs –transactional data management, belong to the CIO Web => Google, other Web companies Three trends –much easier to generate and capture data –much easier to process data (eg on the cloud) –many more people become involved Lead to Big Data –change in perception: data is now at the heart of enterprises –lot of data, how to process it? => big data systems –how to store/query it? => NoSQL databases –how to get value out of it? => data analytics, data science 6

Examples Johnson Control WalmartLabs –product catalog –product matching Non-profit organizations’ database My house My car GE and the Internet of Things Google Knowledge Graph AB testing Everything is increasingly data driven 7

What UW-Madison Is Doing About This? Data science is very hot today (sexiest job of the century, etc.) –pays very well out there, many bootcamps What we think –we have seen fads come and gone –is this a fad? it’s likely that it will stay –the fundamental fact is that everything is increasingly data driven (electricity, digital, online) –so a lot of people and skills are needed to process data –so even if the name data science disappears, the fundamental problem will remain Our current plan –design a sequence of DS courses for grad students: 784, 838, … –design a sequence of DS courses for ugrads (eventually opening up to the entire UW) –design DS plans for the db group, CS dept, and UW-Madison –many universities are doing the same thing –your ideas? What do you want to see? 8

Coverage and Goals of this Class Tasks –extract insights from data = performing analysis –build data-driven artifacts: knowledge bases, rec systems, … –design data-driven experiments to answer a question Need to know –database management (RDBMSs), machine learning, AI, data mining –managing different kinds of data (relational, text, Web, graph, time series, etc) –statistics –optimization, linear algebra –visualization –big data systems –distributed/parallel systems, networking –security/privacy Skills –Python/R data science eco systems –Big data systems: Hadoop, Spark, NoSQL –SQL 9

Coverage and Goals of this Class Tasks –extract insights from data = performing analysis –main focus of this class –let’s illustrate this using an example 10

Example Company has multiple departments Depts interact with customers Boss wants to know –how are customer complaints distributed across depts? –are there any interesting patterns regarding customer complaints? –can we predict anything regarding customer complaints and can we take any action? You the data scientist start by collecting data –Emps(eid, name, phone, address, did) –Depts(did, name) –Complaints(cid, cname, ename, phone, dname, date, desc) –Services(sid, date, desc) Subsequent steps –data extraction –data understanding, cleaning, transformation –data integration –(most likely) data understanding, cleaning, transformation again –data analysis 11

Example You will most likely do two stages –development –production Using a data analysis stack and a big data stack 12

Course Syllabus Big picture RDBMS, machine learning, crowdsourcing, big data systems Extracting insights from data –Data acquisition, data lake –The development stage –Data extraction: from HTML pages, from text –Data understanding, cleaning, transforming –Data integration: matching schemas, matching entities –Data exploration/analysis –The production stage Building artifacts Designing data-intensive experiments to answer questions Misc –managing different kinds of data: text, Web, social media 13

Misc Issues Reading and lecture notes Project 14