CSE 391: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu.

Slides:



Advertisements
Similar presentations
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Advertisements

CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction Event Algebra (Sec )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )
PROBABILITY AND STATISTICS FOR ENGINEERS Session 1 Dr Abdelaziz Berrado MTH3301 —Fall 09.
Course Introduction: Preface to Social Research and Quantitative Methods.
CS190/295 Programming in Python for Life Sciences: Lecture 1 Instructor: Xiaohui Xie University of California, Irvine.
Topic 1: Class Logistics. Outline Class Web site Class policies Overview References Software Background Reading.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
General information CSE : Probabilistic Analysis of Computer Systems
CS 103 Discrete Structures Lecture 01 Introduction to the Course
CSE 531: Performance Analysis of Systems Lecture 2: Probs & Stats review Anshul Gandhi 1307, CS building
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2014) Instructor: ChengXiang (“Cheng”) Zhai 1 Teaching Assistants: Xueqing Liu, Yinan Zhang.
ICS 6B Boolean Logic and Algebra Fall 2015
1 WELCOME TO COMPUTER SCIENCE 1027b COMPUTER SCIENCE FUNDAMENTALS II Lecturers: Eric Schost (001) John Barron (002)
Statistics and Quantitative Analysis U4320
Quantitative Methods in Geography Geography 391. Introductions and Questions What (and when) was the last math class you had? Have you had statistics.
CSE 691: Energy-Efficient Computing Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
Stats Probability Theory. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M W F 2:30pm - 3:20am Arts 133 Lab: M 3:30 - 4:20.
Econ 110 Principles of Microeconomics Welcome!. Dr. Anwar Al-Shriaan Economics Department Office hours: Monday and Wednesday 10:00 – 10:50 am and by appt.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
ICS202 Data Structures King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Math 4030 Midterm Exam Review. General Info: Wed. Oct. 26, Lecture Hours & Rooms Duration: 80 min. Close-book 1 page formula sheet (both sides can be.
Computer Networks CNT5106C
PROBLEM SOLVING AND PROGRAMMING ISMAIL ABUMUHFOUZ | CS 170.
Final Exam Information These slides and more detailed information will be posted on the webpage later…
Computer Science I ISMAIL ABUMUHFOUZ | CS 180. CS 180 Description BRIEF SUMMARY: This course covers a study of the algorithmic approach and the object.
BUSINESS MATHEMATICS & STATISTICS LECTURE 3. Evaluations Calculate Gross Earnings Using Microsoft Excel.
COMP9024: Data Structures and Algorithms Course Outline Hui Wu Session 1, 2016
CSc 120 Introduction to Computer Programing II
Lecture 00: Introduction
Computer Network Fundamentals CNT4007C
Welcome to CS 4390/CS5381: Introduction to Formal Methods
CSE 390: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Course Overview - Database Systems
CSE 544 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Probabilistic Analysis of Computer Systems
Course Information and Introductions
Computer Networks CNT5106C
STATISTICS FOR SCIENCE RESEARCH
Active Learning Lecture Slides
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2016)
CMPT 409 – Competitive Programming (Spring 2018)
CSE 544, Spring 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Applied Statistical Analysis
Simple Linear Regression
Course Information and Introductions
CS 201 – Data Structures and Discrete Mathematics I
Cpt S 471/571: Computational Genomics
CS190/295 Programming in Python for Life Sciences: Lecture 1
Computer Networks CNT5106C
Course Overview - Database Systems
TM 605: Probability for Telecom Managers
CS210 Intermediate Programming with Data Structures
Instructor: Anshul Gandhi (CS)
Lecture 00: Introduction
CS 1111 Introduction to Programming Fall 2018
PHYS 202 Intro Physics II Catalog description: A continuation of PHYS 201 covering the topics of electricity and magnetism, light, and modern physics.
Administrivia- Introduction
ADVANCED PLACEMENT CALCULUS
STAT 400 Probability and Statistics 1
CS 250, Discrete Structures, Fall 2014 Nitesh Saxena
Two Halves to Statistics
Computer Networks CNT5106C
Dr. David Matuszek Spring, 2003
Course Overview CSE5319/7319 Software Architecture and Design
Introduction CSE 2320 – Algorithms and Data Structures
Presentation transcript:

CSE 391: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

CSE 391 Probability & Statistics for Data Science What is Data Science? Analysis of data (using several tools/techniques) Statistics/Data Analysis + CS

CSE 391 Probability & Statistics for Data Science Who is a Data Scientist Statistics/Data Analysis + CS Someone who is better at stats than the average CS person and someone who is better at CS than an average statistician.

Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

Outline Logistics Syllabus Course info Lectures Course webpage Office hours Grading Syllabus

Course Info New course (almost) Deals with probs and stats for DS Probability theory Statistical inference DS techniques UG level course Probability Theory Random Variables Statistical Inference Hypothesis Testing Regression Analysis

Course Info Prerequisites: This is NOT a systems course Probability and Statistics Recommended (not necessary) Basic CS background Python (not necessary, but will help) This is NOT a systems course More of a theory + algorithms course

Course Info Recommended texts: Software: Available from DoIT

Lectures Mon Wed: 4:00pm – 5:20pm Frey 217 On echo (should be available) 5-min break at the halfway point Whiteboard + maybe slides Occasionally some programming (Python/MATLAB) Interactive (please) Carry a book, a real one!

Example 1: Simple stats X is a collection of 99 integers (positive and negative) Mean(X) > 0 How many elements of X are > 0? Same question but now Median(X) > 0? At least 1 At least 50

Course webpage www.cs.stonybrook.edu/~cse391 (will redirect) Please bookmark this page This is your best resource! Will be regularly updated

Course webpage www.cs.stonybrook.edu/~cse391

Course webpage Piazza (link on website) Blackboard for assignments, solutions, and grades

Office hours Just before or after class? CS 347 Will re-visit after add/drop date Will have OH today till 6:30pm TA and TA Office hours: TBA

Example 2: Correlation v/s Causation Q1: Are A and B correlated? A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation

Grading (also on website) 50% assignments 5-6 assignments. Expect 5-7 questions/assignment. Later assignments will have more programming. Build on material taught in class, more challenging. 40% exams (2 exams) Similar to assignment questions, but shorter and simpler Mid-term 1: 15%, Mid-term 2: 25% 10% class participation Exact %ages are somewhat tentative

Grading - assignments 50% assignments 5-6 assignments 5-7 problems per assignment Collaboration is allowed (groups of at most 3 students) One write-up per group. DO NOT COPY across groups Assignments due at the beginning of class NO LATE SUBMISSIONS Hard-copies only (typed/hand-written) Some programming required for later assignments

Grading - exams 40% exams Mid-terms 1 and 2 Written exams Non-overlapping Roughly mid-way and at the end of the semester Written exams Closed-book exams No programming questions Somewhat easier than assignments No collaborations, obviously 75 mins

Grading – class participation Starts after add/drop date Contribute to class discussions Interactive Very helpful for bumping your grade if you are on the border

Grading 50% assignments 40% exams (two exams) 10% class participation

Average income of A+B goes up!! Example 3: Simpson’s Paradox   Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of A+B goes up!! Average income of B goes down  Average income of A goes down 

Average income of A+B Before: 160K/3 = 53.3K Example 3: Simpson’s Paradox Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of A+B Before: 160K/3 = 53.3K After: 200K/3 = 66.7K Person 2: 100K Person X: 80K Person 1: 20K Person X: 40K

Example 3: Simpson’s Paradox Since 2000, the median US wage has risen about 1% (adjusted) But over the same period, the median wage for: high school dropouts, high school graduates with no college education, people with some college education, and people with Bachelor’s or higher degrees have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000. How can both things be true??

Syllabus www.cs.stonybrook.edu/~cse391

Next class Probability review - 1 Basics: sample space, outcomes, probability Events: mutually exclusive, independent Calculating probability: sets, counting, tree diagram