Instructor: Anshul Gandhi (CS)

Slides:



Advertisements
Similar presentations
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Advertisements

CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
Note 1 of 5E Welcome to Pstat5E: Statistics with Economics and Business Applications Yuedong Wang Syllabus.
General information CSE 230 : Introduction to Software Engineering
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction Event Algebra (Sec )
CSE 322: Software Reliability Engineering Topics covered: Course outline and schedule Introduction, Motivation and Basic Concepts.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )
PROBABILITY AND STATISTICS FOR ENGINEERS Session 1 Dr Abdelaziz Berrado MTH3301 —Fall 09.
Lecture 0 Introduction. Course Information Your instructor: – Hyunseung (pronounced Hun-Sung) – Or HK (not Hong Kong ) –
Topic 1: Class Logistics. Outline Class Web site Class policies Overview References Software Background Reading.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
General information CSE : Probabilistic Analysis of Computer Systems
CS 103 Discrete Structures Lecture 01 Introduction to the Course
Lecture 0 Course Overview. ES 345/485 Engineering Probability Course description: Probability and its axioms, conditional probability, sequential experiments,
CSE 531: Performance Analysis of Systems Lecture 2: Probs & Stats review Anshul Gandhi 1307, CS building
2 September Statistics for Behavioral Scientists Psychology W1610x.
ELEC 4030E-4Z01/COMM 6008E-6001 Random Process 隨機程序 2010 Fall Instructor: Hsiao-Ping Tsai Office: EE711 Phone:
Prof. Barbara Bernal NEW Office in J 126 Office Hours: M 4pm - 5:30 PM Class Lecture: M 6 PM - 8:30 in J133 Weekly Web Lecture between Tuesday to Sunday.
ICS 6B Boolean Logic and Algebra Fall 2015
Part 0 -- Introduction Statistical Inference and Regression Analysis: Stat-GB , C Professor William Greene Stern School of Business IOMS.
1 WELCOME TO COMPUTER SCIENCE 1027b COMPUTER SCIENCE FUNDAMENTALS II Lecturers: Eric Schost (001) John Barron (002)
Quantitative Methods in Geography Geography 391. Introductions and Questions What (and when) was the last math class you had? Have you had statistics.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Instructor: Shengyu Zhang. First week Part I: About the course Part II: About algorithms and complexity  What are algorithms?  Growth of functions 
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
ICS202 Data Structures King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Math 4030 Midterm Exam Review. General Info: Wed. Oct. 26, Lecture Hours & Rooms Duration: 80 min. Close-book 1 page formula sheet (both sides can be.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
James Tam Introduction To CPSC 233 James Tam Java Object-Orientation Graphical-user interfaces.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Final Exam Information These slides and more detailed information will be posted on the webpage later…
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Joan Donohue University of South Carolina
CSc 120 Introduction to Computer Programing II
Student Success in General Education
All important information will be posted on Blackboard
CSE 489/589 Modern Networking Concepts
Welcome to CS 4390/CS5381: Introduction to Formal Methods
CSE 390: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
CSE 544 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Probabilistic Analysis of Computer Systems
COMP 283 Discrete Structures
Power Electronics Conversion 2
Theory and Practice of Web Technology
Fall 2016 MATH 250: Calculus III.
CS5040: Data Structures and Algorithms
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Syllabus Overview CSE 4309 – Machine Learning Vassilis Athitsos
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2016)
Syllabus Overview CSE 6363 – Machine Learning Vassilis Athitsos
Data Structures Algorithms: (Slides to be Adopted from Goodrich and aligned with Weiss' book) Instructor: Ganesh Ramakrishnan
September 27 – Course introductions; Adts; Stacks and Queues
CSE 544, Spring 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Simple Linear Regression
CS 6021 Advanced Computer Architecture
CSE3461/5461: Computer Networking (Internet Technologies)
Logistics Instructor: Mohamed Eltabakh
Lecture 00: Introduction
ADVANCED PLACEMENT CALCULUS
CSE 391: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Lecture 1a- Introduction
STAT 400 Probability and Statistics 1
CS 250, Discrete Structures, Fall 2014 Nitesh Saxena
CS Computer Science II: Data Structures and Abstraction Fall 2009
CS 474/674 – Image Processing Fall Prof. Bebis.
MA Fall Instructor: Tim Rolling -Office: MATH 719 -
MA Fall 2018 Instructor: Hunter Simper Office: Math 607
Presentation transcript:

Instructor: Anshul Gandhi (CS) CSE 544, Fall 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Instructor: Anshul Gandhi (CS)

CSE 544, Fall 2018 Probability and Statistics for Data Science What is Data Science? Analysis of data (using several tools/techniques) Statistics/Data Analysis + CS

CSE 544, Fall 2018 Probability and Statistics for Data Science Who is a Data Scientist? Someone who is better at stats than the average CS person and someone who is better at CS than an average statistician.

Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

Outline Logistics Syllabus Course info Lectures Course webpage Office hours Grading Syllabus

Course Info New course (almost) Looks at probs and stats topics from a DS perspective Probability theory Statistical inference DS techniques MS and PhD level course Probability Theory Random Variables Stochastic Processes Statistical Inference Hypothesis Testing Regression and Time Series Analysis

Course Info Prerequisites: This is NOT a systems course Probability and Statistics Will greatly help (not absolutely necessary) Basic CS background We will use Python This is NOT a systems course More of a theory + algorithms course

Course Info Software: Available from DoIT REQUIRED All of Statistics (AoS) RECOMMENDED Perf Modeling (MHB) DS Design Manual (DSD) Software: Available from DoIT

Course webpage www.cs.stonybrook.edu/~cse544 (will redirect) Please bookmark this page This is your best resource! Will be regularly updated

Course webpage www.cs.stonybrook.edu/~cse544

Course webpage Piazza, soon (after enrollment settles) Blackboard for grades

Example 1: Simple stats X is a collection of 99 integers (positive and negative) Mean(X) > 0 How many elements of X are > 0? Same question but now Median(X) > 0?

Lectures Mon Wed: 4:00pm – 5:20pm Javits 111 5-min break at the halfway point Slides + annotations Occasionally some programming (Python) Posted on website after class May have cancellations due to weather or unavailability Will be emailed and updated on website Weather-related class cancelations decided by SBU

In-class Interactive (please) Carry a book, a real one! Please mute your phones No audio/video recording allowed I am working on recording lectures, more on that later Attendance is not mandatory but strongly encouraged If you are present and active in class, I will notice If you are absent, I will notice If you are present and inactive, I will notice that too! May help to bump your grade if you are on the border

Office hours Mon Wed 5:30-6:30pm?? (right after class) NCS 347 Tentative Will re-visit after add/drop date TA and TA Office hours: TBA

Example 2: Correlation v/s Causation Q1: Are A and B correlated? A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation

Grading 50% assignments 40% exams (in-class mid-terms) 10% group mini-project Some parts are tentative!

Grading - assignments 50% assignments 5 assignments, 10% each (once every 2 weeks) 5-8 problems per assignment Later assignments will have more programming Collaboration is allowed (groups of 3-5 students) One write-up per group DO NOT COPY OR DISCUSS ACROSS GROUPS! Try not to assign one problem per team member If a group member is inactive, let me know asap

Grading - assignments Assignment questions will be based on lectures But tougher than examples done in class Will require some effort, helps to discuss among group Assignments due at the beginning of class NO LATE SUBMISSIONS Hard-copies only (typed/hand-written)

Grading - exams 40% exams Mid-terms 1 and 2 In-class exams 15% mid-term 1 (probs & stats), Sept end/early Oct 25% mid-term 2 (inference), mid-Nov Non-overlapping In-class exams Somewhat easier than assignments Based on material/examples covered in lectures (attend!) No collaborations, obviously Closed-book, closed-notes 75 mins

Grading – group mini-project Basically, assignment 6, due at end of semester Data analysis project Programming involved Same as assignment group Can start early, in the 2nd half of the semester Will discuss details as we go along

Grading - recap 50% assignments 40% exams (in-class mid-terms) 10% group mini-project Will provide mid-sem grades (after A1, A2, M1) For self-evaluation purposes only

Average income of (A+B) goes up!! Example 3: Simpson’s Paradox   Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of (A+B) goes up!! Average income of B goes down  Average income of A goes down 

Average income of (A+B) Before: 160K/3 = 53.3K Example 3: Simpson’s Paradox Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of (A+B) Before: 160K/3 = 53.3K After: 200K/3 = 66.7K Person 2: 100K Person X: 80K Person 1: 20K Person X: 40K

Example 3: Simpson’s Paradox Since 2000, the median US wage has risen about 1% (adjusted) But over the same period, the median wage for: high school dropouts, high school graduates with no college education, people with some college education, and people with Bachelor’s or higher degrees have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000. How can both things be true??

Syllabus Probability Theory (8-10 lectures, 2 assignments) Probability review (events, computing probability, conditional prob., Bayes’ thm.) Random variables (Geometric, Exponential, Normal, expectation, moments, etc.) Probability inequalities (Markov’s, Chebyshev’s, Central Limit thm., etc.) Markov chains (stochastic processes, balance equations, etc.) MID-TERM 1 (Early October) Statistical Inference (8-10 lectures , 2-3 assignments) Non-parametric inference (empirical PDF, bias, kernel density, plug-in estimator) Confidence intervals (percentiles, Normal-based CIs) Parametric inference (method of moments, max likelihood estimator) Hypothesis testing (Wald’s test, t-test, KS test, p-values, permutation test) Bayesian inference (Bayesian reasoning, inference, etc.) MID-TERM 2 (Mid-November) Data Science Models (3-5 lectures, 1 assignment) Regression (simple LR, multiple LR, non-linear regression) Time series analysis (moving average, EWMA, AR, ARMA, ARIMA) MINI-PROJECT (Due early December)

Next class Probability review - 1 Basics: sample space, outcomes, probability Events: mutually exclusive, independent Calculating probability: sets, counting, tree diagram