Big Data Programming II Course Introduction

Slides:



Advertisements
Similar presentations
Suggested Course Outline Cloud Computing Bahga & Madisetti, © 2014Book website:
Advertisements

+ CSCI 2141: Intro to Databases Winter 2013 Dr. Kirstie Hawkey.
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Lecture 1: Introduction to Relational.
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Lab2 CPIT 440 Data Mining and Warehouse.
Lecture 2: Introduction to Machine Learning
Knowledge Discovery and Data Mining Evgueni Smirnov.
Machine Learning Lecture 1. Course Information Text book “Introduction to Machine Learning” by Ethem Alpaydin, MIT Press. Reference book “Data Mining.
Knowledge Discovery and Data Mining Evgueni Smirnov.
470 First Lecture1 CMPT 470 Instructor: –Wo-Shun Luk, ASB 10829, –Office Hours: 3:30 – 4:30 M W F TA: –Henry Zhang,
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
CPSC 315 Programming Studio Spring 2008 John Keyser.
CS 445/545 Machine Learning Winter, 2014 See syllabus at
Data Mining Copyright KEYSOFT Solutions.
Lecture 8-1: Introduction to AWS CMPT 733, SPRING 2016 JIANNAN WANG.
Big Data Programming II Course Introduction CMPT 733, SPRING 2016 JIANNAN WANG.
ODL based AI/ML for Networks Prem Sankar Gopannan, Ericsson
CS 784: Advanced Topics in Data Management This semester’s focus: Data Science AnHai Doan.
Lecture 4: Data Integration and Cleaning CMPT 733, SPRING 2016 JIANNAN WANG.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Lecture 3: Anomaly Detection CMPT 733, SPRING 2016 JIANNAN WANG INCLUDING NOTES FROM ARASH VAHDAT.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Learning Bayesian Networks for Complex Relational Data
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Makes Insurance Smarter.
Who am I? Work in Probabilistic Machine Learning Like to teach 
CS 445/545 Machine Learning Winter, 2017
Program Analysis and Software Security
Make Predictions Using Azure Machine Learning Studio
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector Matt Thomson 17/11/2016.
CS 445/545 Machine Learning Spring, 2017
DATA SCIENCE Online Training at GoLogica
Machine Learning I & II.
Introduction to Data Science Lecture 7 Machine Learning Overview
CH. 1: Introduction 1.1 What is Machine Learning Example:
CS 540 Database Management Systems
GR01E - Electronic Commerce Overview
A GACP and GTMCP company
Azure Machine Learning 101
Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo
COMP 4640 Intelligent & Interactive Systems
Introduction to Azure Machine Learning Studio
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
CMPT 733, SPRING 2016 Jiannan Wang
Who Am I ? Jingtao Wang, Assistant Professor in CS and LRDC , will become eventually) Office 5423 SENSQ
OMIS 665, Big Data Analytics
Machine Learning with R
Lecture 8-2: Introduction to Data Visualization
History of Database Systems
Experiences and Lessons Learned from UNC Wilmington
CMPT 354: Database System I
INNOvation in TRAINING BUSINESS ANALYSTS HAO HElEN Zhang UniVERSITY of ARIZONA
CMPT 354: Database System I
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
CS583 – Data Mining and Text Mining
Predictive Models with SQL Server Machine Learning Services
Machine Learning Algorithms – An Overview
Business concentration, minor and certificate programs
Panel on Research Challenges in Big Data
CMPT 733, SPRING 2017 Jiannan Wang
Semi-Supervised Learning
Azure Machine Learning
Course Introduction Data Visualization & Exploration – COMPSCI 590
Machine Learning for Cyber
Python4ML An open-source course for everyone
Presentation transcript:

Big Data Programming II Course Introduction CMPT 733, SPRING 2017 Jiannan Wang

Who am I? Assistant Professor from Simon Fraser University Postdoc from UC Berkeley AMPLab Ph.D. from Tsinghua University 8+ years of research experience in the database field 11/12/2018 Jiannan Wang - CMPT 733

Database: Research Success I am so proud of being a database guy! 4 Turing Award Winners in 50 years 1973 1981 1998 2014 Network database Relational database Transactional database Modern database systems Charles W. Bachman Edgar Codd Jim Gray Michael Stonebraker 11/12/2018 Jiannan Wang - CMPT 733

Database: Industry Success I am so proud of being a database guy! $30~$50 billion market per year 3x faster growth . . . 11/12/2018 Jiannan Wang - CMPT 733

But, what has changed in the “Big Data” world? People have different opinions on this question. You should have your own opinion! I believe there are two major changes: Cloud Computing and Machine Learning 11/12/2018 Jiannan Wang - CMPT 733

Cloud Computing Making it tremendously easier and cheaper to run applications on a cluster (e.g., 50 nodes) Oracle’s Biggest Competitor? How to Program? CMPT 732. http://bit.ly/2iaNKuT 11/12/2018 Jiannan Wang - CMPT 733

Understanding the past Predicating the future Machine Learning (ML) People are not only interested in understanding the past but also in predicting the future Imagine your boss asks you the following questions Understanding the past (SQL Analytics) Predicating the future (Machine Learning) How many Apple Watches did we sell in 2015? How many Apple Watches will we sell in 2016? Which movies did “Bob” watch before? Which movies will “Bob” like to watch in the future? … … 11/12/2018 Jiannan Wang - CMPT 733

ML in Class vs. ML in Practice Data is clean, and comes from a single source Developing a new ML model is quite important A model should have good properties in theory ML in Practice Data is messy, and often comes from multiple sources Feature selection and parameter tuning are quite important A model should have good performance in production 11/12/2018 Jiannan Wang - CMPT 733

What’s this course about? Goals Advanced analytics (beyond SQL analytics) Tools Spark MLlib, Scikit-Learn, Caffee Lecture style More “why” less “how” Assignment style Problem centric instead of tool centric 11/12/2018 Jiannan Wang - CMPT 733

Course Plan Part I: Advanced Analytics with Spark Introduction & MLlib Basics Sentiment Analysis (Feature Representation + Linear Regression) Anomaly Detection (Unsupervised Learning) Data Integration and Cleaning Movie Recommendation Part II: Advanced Analytics with Other Tools Small Data (Scikit-Learn) Deep Learning (caffee) Data Labeling (Crowdsourcing, Active Learning) Project Proposal Presentation Visualization (D3) & AWS Dr. Jiannan Wang Dr. Apala Guha 11/12/2018 Jiannan Wang - CMPT 733

Course Setup Marking Lectures (1 hour/week) Labs (5 hours/week) TAs Assignments: 9 x 8 = 72% Project: (proposal + poster + report): 28% Lectures (1 hour/week) Group A, B: Monday 9:30-10:20 Labs (5 hours/week) Group A: Tues 9:30–11:20, Thurs 9:00–11:50 Group B: Tues 14:30–16:20, Thurs 13:30–16:20 TAs Venkatesh Palani Prabhu Angalakurichi Ramaraj <vangalak@sfu.ca> Jaypratap Naidu <jna50@sfu.ca> 11/12/2018 Jiannan Wang - CMPT 733