Feedback – Lab 2 9 Sept 2014. Your learning experience in this course.

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
Course Web Site – Also linked from Blackboard Course Materials – Excel Tutorials – Access Tutorials – PPT.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Testing Theories: Three Reasons Why Data Might not Match the Theory.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 9-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Fitting a Model to Data Reading: 15.1,
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 8 Introduction to Hypothesis Testing.
CHAPTER 8 Estimating with Confidence
Washington Group on Disability Statistics Pre-test implementation documents Catriona Bate September 2005.
For Better Accuracy Eick: Ensemble Learning
Chapter 10 Hypothesis Testing
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Descriptive statistics Inferential statistics
March  There is a maximum of one obtuse angle in a triangle, but can you prove it?  To prove something like this, we mathematicians must do a.
STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Using Lock5 Statistics: Unlocking the Power of Data
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Intro to By Debra Dimas Special thanks to David Fogliatti and Armand Amarento of Oceanside HS in San Diego.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Introduction: Why statistics? Petter Mostad
90288 – Select a Sample and Make Inferences from Data The Mayor’s Claim.
Leena Razzaq Office: 310BWVH Office hours: Monday 11am-1pm or by appointment jys.
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Lab Reports Biology. The First Section of the Lab- Introduction Name at top right corner Date also in the top left corner Title –Appropriate title on.
90288 – Select a Sample and Make Inferences from Data The Mayor’s Claim.
1 Technical & Business Writing (ENG-315) Muhammad Bilal Bashir UIIT, Rawalpindi.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Grade Book Database Presentation Jeanne Winstead CINS 137.
Reasoning in Psychology Using Statistics Psychology
1 CHAPTER 4 CHAPTER 4 WHAT IS A CONFIDENCE INTERVAL? WHAT IS A CONFIDENCE INTERVAL? confidence interval A confidence interval estimates a population parameter.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Welcome to MM570 Psychological Statistics
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
PSYCHOLOGY IA THE RESULTS. RATIONALE/PURPOSE The results section is where you report the results that you have found from your experiment. The results.
ESS. THE SCIENTIFIC METHOD “The strongest arguments prove nothing so long as the conclusions are not verified by experience. Experimental science is the.
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
INTRODUCTION: WELCOME TO STAT 200 January 5 th, 2009.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Writing a Formal Lab Report Mrs. Storer Chemistry.
DRAWING INFERENCES FROM DATA THE CHI SQUARE TEST.
??? Steps The Are What  1. OBSERVATION (or problem): Develop a question based on the observation/problem  2. GATHER INFORMATION: You need to get educated.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Dr. Bea Bourne 1. 2 If you have any trouble in seminar, please call Tech Support at: They can assist if you get “bumped” from the seminar.
CTS130 Spreadsheet Lesson 6 Working with Math & Trig, Statistical, and Date & Time Functions.
IMPACT SAMR Cover Sheet Task OverviewLearning Objective(s)Suggested Technology Explain how to find slope given 2 points then be able to write in slope.
Machine Learning with Spark MLlib
CHAPTER 10 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Reasoning in Psychology Using Statistics
Stat 217 – Day 17 Review.
Learning to Classify Documents Edwin Zhang Computer Systems Lab
Data Quality: Why it Matters
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Presentation and project
Instructor: Kurt Baker
Presentation transcript:

Feedback – Lab 2 9 Sept 2014

Your learning experience in this course

Lab Sessions: Text Comprehension & Task Interpretation (Always: point out inaccuracies) Use Case 1 – I do not understand the text: go back to the video lecture, probably you have not built the background context required for completing the task.

…. Continued Use case 2: – Oh my god, what am I supposed to do here? read the text several times and identify the key points in the text strucure: - Description - The purpose - Tasks - pre-processing: feature transformation - identify the best features by applying your knowledge about empirical error - Interpret the results based on your knowledge about empirical error and your common sense knowledge or historical research.

My expectations on your learning experience Students should be able to interpret the text and the tasks (diversified interpretations are allowed and welcome) Students should be able to show critical mind by working out a plausible interpretation(s) and motivate their choice (s).

About instructions and time… I am not sure that instructions were unclear. The core task is the representation bin0 and bin1 in order to apply the formulae. This was the cognitive effort of this lab. You could work in groups and groups could exchange info between them… and for several hours…. And you made it!

Pre-processing: feature transformation Categorical features  Binary features – Each feature shoud assume a value 0 or a value 1 following the instructions under the heading ”Preprocessing” (search & replace; if formulae; whatever…)

The task was about empirical error (Lect 6, min 7:44) Empirical error: how well the chosen hypothesis classifies the training data. How do you assess a hypothesis? – Systematic counting of correct guesses and wrong guesses made by the hypothesis wrt the correct labels – This means that you must compare the predictions of the hypothesis with the actual labels

Lab Task Our hypotheses were the different features. We have to assess each feature wtr to classiffication (survived vs died)

1) For each feature, calculate the empirical error LEARN TO PREDICT THE FIRST COLUMN – (a) For each of the features calculate (and write down) the training error if you used only that feature to classify the data. To do this you will need to do the following for each feature: – Split the data based on that feature. Call bin0 all examples that have 0 for that features and bin1 all examples that have 1 for that feature. – Calculate the majority count for the label in each bin, i.e. for bin0, majority(bin0) = max(count(bin0 = survive); count(bin0 = notsurvive))

Accuracy/Error A possible representation…. WATCH OUT! AGE FEATURE IS TRICKY HERE!

Other representations (etc. etc.)

Which feature would be best to use? EMBARKED… if we trust this sample and our calculations… (error rate on this feature is the lowest) Basically this means that many of those who started their trip from Southampton did not survived. However, the difference betw the features was very small!

Many interesting interpretations! None believed that Embarked was a good feature for real ”this could depend on the small dataset” ”embarked feature gave the lowest error […] Intutivetly the first class feature should have the strongest relationship with the chance of surviving” ”If we calculate accuracy with more features […], we get more interesting results” ”The Embarked would be the best to use because it has the lowest error rate. In reality it is very unlikely that the city has any correlation with their chance of survival, unless they recieved some special training before boarding or shared a rough upbringing in the city” Etc.

Missing values Good that you noticed that there were missing values, ie cells without any value! – Some of you have removed them – Some of you have coverted to >25 In practice, missing values require ”more investigation” Missing values are not considered to be ”noise” in the sense that was explained during the video lecture.

Technical troubles If you experience problems with a computer: configuration problems, weird behaviour, etc. just change computer and report the touble (Per?)

Next… Those who have miscalculated the empirical error should recalculated in the correct way as presented. Those who want, can have some additional training with an optional task that is on the website. It contains the solution. You do not need to submit anything. It is just for you! All those who have submitted the report have completed this lab task. Well done!