CSE 544, Spring 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu.

Slides:



Advertisements
Similar presentations
Transitioning to Semesters CSE MS Program Prof. Gagan Agrawal Grad Studies Chair.
Advertisements

CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
Note 1 of 5E Welcome to Pstat5E: Statistics with Economics and Business Applications Yuedong Wang Syllabus.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction Event Algebra (Sec )
Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )
Psyc 235: Introduction to Statistics
Topic 1: Class Logistics. Outline Class Web site Class policies Overview References Software Background Reading.
CHEMISTRY 10123/10125 Spring 2007 Instructor: Professor Tracy Hanna Phone: Office: SWR 418
© 2004 Goodrich, Tamassia CS2210 Data Structures and Algorithms Lecture 1: Course Overview Instructor: Olga Veksler.
CS 450 MODELING AND SIMULATION Instructor: Dr. Xenia Mountrouidou (Dr. X)
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
General information CSE : Probabilistic Analysis of Computer Systems
CS 103 Discrete Structures Lecture 01 Introduction to the Course
CSE 531: Performance Analysis of Systems Lecture 2: Probs & Stats review Anshul Gandhi 1307, CS building
Penn State University, School of Business Administration 10/1/20151 MRKT 472-MARKETING RESEARCH Dr. Ugur Yucelt School of Business Administration Spring.
UNIT 8: PROBABILITY 7 TH GRADE MATH MS. CARQUEVILLE.
Econ 111 Principles of Macroeconomics Welcome!. “Economics is common sense made difficult” As seen by an undergraduate student.
SE 2030 Software Engineering Tools and Practices SE 2030 Dr. Rob Hasker 1 Based on slides written by Dr. Mark L. Hornick Used with permission.
ICS104 Computer Programming Second Semester 2012/2013 ICS1041 Tuwailaa Alshammari College of Computer Science & Engineering University.
1 WELCOME TO COMPUTER SCIENCE 1027b COMPUTER SCIENCE FUNDAMENTALS II Lecturers: Eric Schost (001) John Barron (002)
Statistics and Quantitative Analysis U4320
1 Course Overview Dr. Jerrell T. Stracener, SAE Fellow UPDATED 08/26/08 EMIS 7370/5370 STAT 5340 Probability and Statistics for Scientists and Engineers.
Quantitative Methods in Geography Geography 391. Introductions and Questions What (and when) was the last math class you had? Have you had statistics.
CSE 691: Energy-Efficient Computing Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
Econ 110 Principles of Microeconomics Welcome!. Dr. Anwar Al-Shriaan Economics Department Office hours: Monday and Wednesday 10:00 – 10:50 am and by appt.
1 Daily Announcements CS 202, Spring 2007 Aaron Bloomfield.
CS Welcome to CS 5383, Topics in Software Assurance, Toward Zero-defect Programming Spring 2007.
1 CS 320 Interaction Design Spring 2011 Course Syllabus January19, 2011.
Introduction to Statistics I MATH 1131, Summer I 2008, Department of Math. & Stat., York University.
1 STAT 3080/APMA 3501 From Data to Knowledge Fall 2014 Malathi Veeraraghavan Professor, Electrical & Computer Engineering Dept. Course web site:
Penn State University, School of Business Administration 1/21/20161 MRKT 472-MARKETING RESEARCH Dr. Ugur Yucelt School of Business Administration Spring.
1 CS 4396 Computer Networks Lab General Info. 2 Goal: This course aims at helping students get more insight into how the Internet works and gain hands.
Computer Networks CNT5106C
Final Exam Information These slides and more detailed information will be posted on the webpage later…
CSE MS Program Prof. Gagan Agrawal Grad Studies Chair.
Just one quick favor… Please use your phone or laptop Please take just a minute to complete Course Evaluations online….. Check your for a link or.
CSc 120 Introduction to Computer Programing II
Lecture 00: Introduction
PhD at CSE: Overview CSE department offers Doctoral degree in the Computer Science (CS) or Computer Engineering areas (CpE) at both MS to PhD and BS to.
Welcome to CS 4390/CS5381: Introduction to Formal Methods
CSE 390: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
CSE 544 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Probabilistic Analysis of Computer Systems
CS101 Computer Programming I
Course Information and Introductions
STATISTICS FOR SCIENCE RESEARCH
How I Survived AIST2330 and Learned to Love Server Admin
Data Structures Algorithms: (Slides to be Adopted from Goodrich and aligned with Weiss' book) Instructor: Ganesh Ramakrishnan
Autonomous Cyber-Physical Systems: Course Introduction
Cpt S 471/571: Computational Genomics
Course Overview - Database Systems
TM 605: Probability for Telecom Managers
Statistics and Data Analysis
CS210 Intermediate Programming with Data Structures
Instructor: Anshul Gandhi (CS)
Lecture 00: Introduction
ADVANCED PLACEMENT CALCULUS
CSE 391: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Lecture 1a- Introduction
STAT 400 Probability and Statistics 1
Computer Networks CNT5106C
Dr. David Matuszek Spring, 2003
Lecture 1a- Introduction
TaeKyoung Kwon Engineering Math II TaeKyoung Kwon
Class administration 10/11/05.
Presentation transcript:

CSE 544, Spring 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

CSE 544, Spring 2018 Probability and Statistics for Data Science What is Data Science? Analysis of data (using several tools/techniques) Statistics/Data Analysis + CS

CSE 544, Spring 2018 Probability and Statistics for Data Science Who is a Data Scientist Statistics/Data Analysis + CS Someone who is better at stats than the average CS person and someone who is better at CS than an average statistician.

Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

Outline Logistics Syllabus Course info Lectures Course webpage Office hours Grading Syllabus

Course Info New course (almost) Deals with probs and stats for DS Probability theory Statistical inference DS techniques MS and PhD level course Probability Theory Random Variables Stochastic Processes Statistical Inference Hypothesis Testing Regression and Time Series Analysis

Course Info Prerequisites: This is NOT a systems course Probability and Statistics Would help (not necessary) Basic CS background We will use Python This is NOT a systems course More of a theory + algorithms course

Course Info Software: Available from DoIT REQUIRED All of Statistics (AoS) RECOMMENDED Perf Modeling (MHB) DS Design Manual (DSD) Software: Available from DoIT

Course webpage www.cs.stonybrook.edu/~cse544 (will redirect) Please bookmark this page This is your best resource! Will be regularly updated

Course webpage www.cs.stonybrook.edu/~cse544

Course webpage Piazza?? Blackboard for grades

Example 1: Simple stats X is a collection of 99 integers (positive and negative) Mean(X) > 0 How many elements of X are > 0? Same question but now Median(X) > 0? At least 1 (could be 99) At least 50 (could be 99)

Lectures Mon Wed: 2:30pm – 3:50pm Humanities 1003 5-min break at the halfway point Slides + annotations Occasionally some programming (Python) Posted on website after class May have cancellations due to weather or unavailability Will be emailed and updated on website

In-class Interactive (please) Carry a book, a real one! Please mute your phones

Office hours Mon Wed 4-5pm?? (right after class) NCS 347 Tentative Will re-visit after add/drop date TA and TA Office hours: TBA

Example 2: Correlation v/s Causation Q1: Are A and B correlated? A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation

Grading 30% assignments 30% exams (in-class mid-terms) 30% Group project 10% class participation Some parts are tentative!

Grading - assignments 30% assignments 5 assignments 5-8 problems per assignment Later assignments will have more programming Collaboration is allowed (groups of 3-5 students) One write-up per group. DO NOT COPY! Assignments due at the beginning of class NO LATE SUBMISSIONS Hard-copies only (typed/hand-written)

Grading - exams 30% exams Mid-terms 1 and 2 In-class exams 12% mid-term 1 (probs & stats), Feb end/early March 18% mid-term 2 (inference), mid-April Non-overlapping In-class exams Somewhat easier than assignments No collaborations, obviously Closed-book, closed-notes 75 mins

Grading – group project Data analysis project Programming involved Groups of 4-8 (need not be same as assignment groups) In the 2nd half of the semester Will discuss details as we go along

Grading – class participation Starts after add/drop date Contribute to class discussions Interactive Very helpful for bumping your grade

Grading - recap 30% assignments 30% exams (in-class mid-terms) 30% group project 10% class participation Will provide mid-sem grades (after A1, A2, M1) For self-evaluation purposes only

Average income of (A+B) goes up!! Example 3: Simpson’s Paradox   Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of (A+B) goes up!! Average income of B goes down  Average income of A goes down 

Average income of (A+B) Before: 160K/3 = 53.3K Example 3: Simpson’s Paradox Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of (A+B) Before: 160K/3 = 53.3K After: 200K/3 = 66.7K Person 2: 100K Person X: 80K Person 1: 20K Person X: 40K

Example 3: Simpson’s Paradox Since 2000, the median US wage has risen about 1% (adjusted) But over the same period, the median wage for: high school dropouts, high school graduates with no college education, people with some college education, and people with Bachelor’s or higher degrees have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000. How can both things be true??

Syllabus

Next class Probability review - 1 Basics: sample space, outcomes, probability Events: mutually exclusive, independent Calculating probability: sets, counting, tree diagram