CSE 390: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu.

Slides:



Advertisements
Similar presentations
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Advertisements

CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
Welcome to Introduction to Java Programming At J.D.O’Bryant Science & Mathematics Chonho Lee Department of Computer Science University of Massachusetts.
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Stats Probability Theory.
Statistical Methods in Computer Science Course Introduction Ido Dagan.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction Event Algebra (Sec )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )
PROBABILITY AND STATISTICS FOR ENGINEERS Session 1 Dr Abdelaziz Berrado MTH3301 —Fall 09.
CS190/295 Programming in Python for Life Sciences: Lecture 1 Instructor: Xiaohui Xie University of California, Irvine.
Active Learning Lecture Slides
Topic 1: Class Logistics. Outline Class Web site Class policies Overview References Software Background Reading.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
General information CSE : Probabilistic Analysis of Computer Systems
CS 103 Discrete Structures Lecture 01 Introduction to the Course
CSE 531: Performance Analysis of Systems Lecture 2: Probs & Stats review Anshul Gandhi 1307, CS building
2 September Statistics for Behavioral Scientists Psychology W1610x.
Penn State University, School of Business Administration 10/1/20151 MRKT 472-MARKETING RESEARCH Dr. Ugur Yucelt School of Business Administration Spring.
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2014) Instructor: ChengXiang (“Cheng”) Zhai 1 Teaching Assistants: Xueqing Liu, Yinan Zhang.
UNIT 8: PROBABILITY 7 TH GRADE MATH MS. CARQUEVILLE.
1 WELCOME TO COMPUTER SCIENCE 1027b COMPUTER SCIENCE FUNDAMENTALS II Lecturers: Eric Schost (001) John Barron (002)
CS 6961: Structured Prediction Fall 2014 Course Information.
Statistics and Quantitative Analysis U4320
Quantitative Methods in Geography Geography 391. Introductions and Questions What (and when) was the last math class you had? Have you had statistics.
CSE 691: Energy-Efficient Computing Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
Stats Probability Theory. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M W F 2:30pm - 3:20am Arts 133 Lab: M 3:30 - 4:20.
Econ 110 Principles of Microeconomics Welcome!. Dr. Anwar Al-Shriaan Economics Department Office hours: Monday and Wednesday 10:00 – 10:50 am and by appt.
Math 4030 Midterm Exam Review. General Info: Wed. Oct. 26, Lecture Hours & Rooms Duration: 80 min. Close-book 1 page formula sheet (both sides can be.
Penn State University, School of Business Administration 1/21/20161 MRKT 472-MARKETING RESEARCH Dr. Ugur Yucelt School of Business Administration Spring.
The Benefits of Strictly Enforced Cooperative Learning at the College Level Jeremy Flanagan And Melissa Gardner.
1 CS 4396 Computer Networks Lab General Info. 2 Goal: This course aims at helping students get more insight into how the Internet works and gain hands.
1 CSC 281 Discrete Mathematics for Computer Science Dr.Yuan Tian Syllabus.
Statistics and Probability Theory Lecture 01 Fasih ur Rehman.
PROBLEM SOLVING AND PROGRAMMING ISMAIL ABUMUHFOUZ | CS 170.
Final Exam Information These slides and more detailed information will be posted on the webpage later…
CSE MS Program Prof. Gagan Agrawal Grad Studies Chair.
CSc 120 Introduction to Computer Programing II
Lecture 00: Introduction
Welcome to CS 4390/CS5381: Introduction to Formal Methods
CSE 544 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Probabilistic Analysis of Computer Systems
STATISTICS FOR SCIENCE RESEARCH
Active Learning Lecture Slides
CS598CXZ (CS510) Advanced Topics in Information Retrieval (Fall 2016)
1st Assignment Use this template Open Offical Course Syllabus or adlib
CSE 544, Spring 2018 Probability and Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
Simple Linear Regression
CS190/295 Programming in Python for Life Sciences: Lecture 1
TM 605: Probability for Telecom Managers
Enhancing Accountability Alabama’s Colleges and Universities
CS210 Intermediate Programming with Data Structures
CSE 310 Human-Computer Interaction
Instructor: Anshul Gandhi (CS)
Lecture 00: Introduction
PHYS 202 Intro Physics II Catalog description: A continuation of PHYS 201 covering the topics of electricity and magnetism, light, and modern physics.
Administrivia- Introduction
CSE 391: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building
L L Line CSE 420 Computer Games Organizational Issues.
STAT 400 Probability and Statistics 1
CS 250, Discrete Structures, Fall 2014 Nitesh Saxena
Two Halves to Statistics
Computer Networks CNT5106C
Dr. David Matuszek Spring, 2003
Lecture 1a- Introduction
CS276 Information Retrieval and Web Search
TaeKyoung Kwon Engineering Math II TaeKyoung Kwon
CS 250, Discrete Structures, Fall 2015 Nitesh Saxena
Presentation transcript:

CSE 390: Special Topics: Probability & Statistics for Data Science Lecture 1: Intro and Logistics Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

CSE 390 Probability & Statistics for Data Science What is Data Science? Analysis of data (using several tools/techniques) Statistics/Data Analysis + CS

CSE 390 Probability & Statistics for Data Science Who is a Data Scientist Statistics/Data Analysis + CS Someone who is better at stats than the average CS person and someone who is better at CS than an average statistician.

Anshul Gandhi 347, New CS building anshul@cs.stonybrook.edu anshul.gandhi@stonybrook.edu

Outline Logistics Syllabus Course info Lectures Course webpage Office hours Grading Syllabus

Course Info New course (almost) Deals with probs and stats for DS Probability theory Statistical inference DS techniques Contents Probability Theory Random Variables Statistical Inference Hypothesis Testing Regression Analysis

Course Info Prerequisites: This is NOT a systems course Probability and Statistics Recommended (not necessary) Basic CS background Python (not necessary, but will help) This is NOT a systems course More of a theory + algorithms course

Course Info Recommended texts: Software: Available from DoIT

Lectures Tue Thu: 4:00pm – 5:20pm Frey 205 5-min break at the halfway point Whiteboard + maybe slides Occasionally some programming (Python/MATLAB) Interactive (please) Carry a book, a real one!

Example 1: Simple stats X is a collection of 99 integers (positive and negative) Mean(X) > 0 How many elements of X are > 0? Same question but now Median(X) > 0?

Course webpage www.cs.stonybrook.edu/~cse390 (will redirect) Please bookmark this page This is your best resource! Will be regularly updated

Course webpage www.cs.stonybrook.edu/~cse390

Course webpage Piazza?? Blackboard for grades

Office hours Just before or after class? CS 347 Will re-visit after add/drop date TA and TA Office hours: TBA

Example 2: Correlation v/s Causation Q1: Are A and B correlated? A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation Q2: Which of the following is true (i) A causes B (ii) B causes A (iii) Either (i) or (ii) (iv) None of the above A B

Example 2: Correlation v/s Causation

Grading 48% assignments 42% exams (2 in-class mid-terms) 6 assignments. Expect 5-6 questions/assignment Later assignments will have more programming 42% exams (2 in-class mid-terms) Similar to assignment questions, but shorter and simpler Mid-term 1: 20%, Mid-term 2: 22% 10% class participation Exact %ages are tentative!

Grading - assignments 48% assignments 6 assignments 5-6 problems per assignment Collaboration is allowed (groups of 3-4 students) One write-up per group. DO NOT COPY! Assignments due at the beginning of class NO LATE SUBMISSIONS Hard-copies only (typed/hand-written) Some programming/Python/MATLAB required

Grading - exams 42% exams Mid-terms 1 and 2 In-class exams Non-overlapping At 1/3 and 2/3 of the semester In-class exams Closed-book exams No programming questions Somewhat easier than assignments No collaborations, obviously 80 mins

Grading – class participation Starts after add/drop date Contribute to class discussions Interactive Very helpful for bumping your grade if you are on the border

Grading 48% assignments 42% exams (in-class mid-terms) 10% class participation

Average income of A+B goes up!! Example 3: Simpson’s Paradox   Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of A+B goes up!! Average income of B goes down  Average income of A goes down 

Average income of A+B Before: 160K/3 = 53.3K Example 3: Simpson’s Paradox Earns below-average income in B Earns above-average income in A Developed Nation (B) Developing Nation (A) Average income of A+B Before: 160K/3 = 53.3K After: 200K/3 = 66.7K Person 2: 100K Person X: 80K Person 1: 20K Person X: 40K

Example 3: Simpson’s Paradox Since 2000, the median US wage has risen about 1% (adjusted) But over the same period, the median wage for: high school dropouts, high school graduates with no college education, people with some college education, and people with Bachelor’s or higher degrees have all decreased. In other words, within every educational subgroup, the median wage is lower now than it was in 2000. How can both things be true??

Syllabus www.cs.stonybrook.edu/~cse390

Next class Probability review - 1 Basics: sample space, outcomes, probability Events: mutually exclusive, independent Calculating probability: sets, counting, tree diagram