Big Data Programming II Course Introduction CMPT 733, SPRING 2016 JIANNAN WANG.

Slides:



Advertisements
Similar presentations
Turning Data into Value Ion Stoica CEO, Databricks (also, UC Berkeley and Conviva) UC BERKELEY.
Advertisements

1 1)Introduction to Machine Learning 2)Learning problem 3)Learning system design IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
© Heikki Topi Computing and University Education in Analytics ACM Education Council San Francisco, CA November 2, 2013 Heikki Topi, Bentley University.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Lab2 CPIT 440 Data Mining and Warehouse.
Lecture 2: Introduction to Machine Learning
서울대학교 컴퓨터공학부 바이오지능 연구실 2014 Spring Semester Course Instructor: Prof. Byoung-Tak Zhang TAs: Ha-Young Jang & Beom-Jin Lee Classroom: , Time:
Practical Machine Learning Pipelines with MLlib Joseph K. Bradley March 18, 2015 Spark Summit East 2015.
CS345: Advanced Databases Chris Ré. What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Machine Learning Lecture 1. Course Information Text book “Introduction to Machine Learning” by Ethem Alpaydin, MIT Press. Reference book “Data Mining.
Knowledge Discovery and Data Mining Evgueni Smirnov.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
Understanding the field & setting expectations.  Personal  International  UNT Alumni (Mathematics)  Academic  Economics & Mathematics  Professional.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Advanced Software Engineering
CS 445/545 Machine Learning Winter, 2014 See syllabus at
Data Mining and Decision Support
Course Overview for Compilers J. H. Wang Sep. 20, 2011.
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
Mining of Massive Datasets Edited based on Leskovec’s from
Lecture 8-1: Introduction to AWS CMPT 733, SPRING 2016 JIANNAN WANG.
ODL based AI/ML for Networks Prem Sankar Gopannan, Ericsson
Lecture 4: Data Integration and Cleaning CMPT 733, SPRING 2016 JIANNAN WANG.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Lecture 3: Anomaly Detection CMPT 733, SPRING 2016 JIANNAN WANG INCLUDING NOTES FROM ARASH VAHDAT.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Learning Bayesian Networks for Complex Relational Data
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Usman Roshan Dept. of Computer Science NJIT
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Matrix Factorization and Collaborative Filtering
Makes Insurance Smarter.
Who am I? Work in Probabilistic Machine Learning Like to teach 
CS 445/545 Machine Learning Winter, 2017
Introduction to Spark Streaming for Real Time data analysis
Make Predictions Using Azure Machine Learning Studio
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector Matt Thomson 17/11/2016.
Intro to Machine Learning
CS 445/545 Machine Learning Spring, 2017
DATA SCIENCE Online Training at GoLogica
Machine Learning I & II.
Introduction to Data Science Lecture 7 Machine Learning Overview
CH. 1: Introduction 1.1 What is Machine Learning Example:
Deep Learning with TensorFlow online Training at GoLogica Technologies
GR01E - Electronic Commerce Overview
Introduction to Spark.
Azure Machine Learning 101
Spark Software Stack Inf-2202 Concurrent and Data-Intensive Programming Fall 2016 Lars Ailo Bongo
Big Data Programming II Course Introduction
COS 518: Advanced Computer Systems Lecture 12 Mike Freedman
CMPT 733, SPRING 2016 Jiannan Wang
Apache Spark & Complex Network
History of Database Systems
Experiences and Lessons Learned from UNC Wilmington
CMPT 354: Database System I
Machine Learning Algorithms – An Overview
Business concentration, minor and certificate programs
Panel on Research Challenges in Big Data
CMPT 733, SPRING 2017 Jiannan Wang
Semi-Supervised Learning
Usman Roshan Dept. of Computer Science NJIT
Azure Machine Learning
Jia-Bin Huang Virginia Tech
Recommender Systems Problem formulation Machine Learning.
Presentation transcript:

Big Data Programming II Course Introduction CMPT 733, SPRING 2016 JIANNAN WANG

Who am I? Assistant Professor from Simon Fraser University Postdoc from UC Berkeley AMPLab Ph.D. from Tsinghua University 7+ years of research experience in the database field 01/05/16JIANNAN WANG - CMPT 733 2

Database: Research Success I am so proud of being a database guy! Network database Charles W. Bachman 4 Turing Award Winners in 50 years Relational database Transactional database Modern database systems Edgar Codd Jim Gray Michael Stonebraker 01/05/16JIANNAN WANG - CMPT 733 3

Database: Industry Success I am so proud of being a database guy! $30~$50 billion market per year 01/05/16JIANNAN WANG - CMPT x faster growth

But, what has changed in the “Big Data” world? People have different opinions on this question. You should have your own opinion! I believe there are two major changes: Cloud Computing and Machine Learning 01/05/16JIANNAN WANG - CMPT 733 5

Cloud Computing Making it tremendously easier and cheaper to run applications on a cluster (e.g., 50 nodes) Oracle’s Biggest Competitor? How to Program? CMPT /05/16JIANNAN WANG - CMPT 733 6

Machine Learning (ML) People are not only interested in understanding the past but also in predicting the future Understanding the past (SQL Analytics) Predicating the future (Machine Learning) Imagine your boss asks you the following questions 01/05/16JIANNAN WANG - CMPT How many Apple Watches did we sell in 2015? How many Apple Watches will we sell in 2016? Which movies did “Bob” watch before? Which movies will “Bob” like to watch in the future? … …

ML in Class vs. ML in Practice ML in Class ◦Data is clean, and comes from a single source ◦Developing a new ML model is quite important ◦A model should have good properties in theory ML in Practice ◦Data is messy, and often comes from multiple sources ◦Feature selection and parameter tuning are quite important ◦A model should have good performance in production 01/05/16JIANNAN WANG - CMPT 733 8

What’s this course about? Goals ◦Advanced analytics (beyond SQL analytics) Tools ◦Spark (MLlib, GraphX, Streaming), Scikit-Learn, Caffee Lecture style ◦More “why” less “how” Assignment style ◦Problem centric instead of tool centric 01/05/16JIANNAN WANG - CMPT 733 9

Course Plan Part I: Advanced Analytics with Spark 1.Introduction & MLlib Basics 2.Sentiment Analysis (Feature Representation + Linear Regression) 3.Sentiment Analysis (Optimization) 4.Anomaly Detection (Unsupervised Learning + Dimensionality Reduction) 5.Movie Recommendation (Collaborative Filtering) 6.Scala Preliminaries (Functional Programming) 7.Social Network Analysis (GraphX Basics+Personalized PageRank) Part II: Advanced Analytics with Other Tools 8.Small Data (Scikit-Learn) 9.Project Proposal Presentation 10.Deep Learning (caffee) 01/05/16JIANNAN WANG - CMPT Dr. Jiannan Wang Dr. Apala Guha

Course Setup Marking ◦Assignments: 9 x 8 = 72% ◦Project: (proposal + poster + report): 28% Lectures (1 hour/week) ◦Group A, B: Tuesday 9:30-10:20 Labs (5 hours/week) ◦Group A: Tues 10:30–12:20, Thurs 9:30–12:20 ◦Group B: Tues 2:30–4:20, Thurs 2:30–5:20 TAs ◦Zicun Cong ◦Syed Tanveer Jisahan 01/05/16JIANNAN WANG - CMPT