These slides are additional material for TIES4451 Data Mining: Data Lecture 3 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.

Slides:



Advertisements
Similar presentations
Survey design. What is a survey?? Asking questions – questionnaires Finding out things about people Simple things – lots of people What things? What people?
Advertisements

Survey Methodology Interviewing EPID 626 Lecture 9.
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
Multiple Indicator Cluster Surveys Survey Design Workshop
Errors and Uncertainties in Biology Accuracy Accuracy indicates how close a measurement is to the accepted value. For example, we'd expect a balance.
Estimation of Sample Size
These slides are additional material for TIES4451 Lecture 5 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.
Statistical Concepts (continued) Concepts to cover or review today: –Population parameter –Sample statistics –Mean –Standard deviation –Coefficient of.
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
An Overview of Today’s Class
Ch. 3.1 – Measurements and Their Uncertainty
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
Measurement, Quantification and Analysis Some Basic Principles.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Standard error of estimate & Confidence interval.
Slide 1 of 48 Measurements and Their Uncertainty
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
Using and Expressing Measurements
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Slide 1 of 48 Measurements and Their Uncertainty
Slide 1 of 48 Measurements and Their Uncertainty
Measurement and Its Uncertainties.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Using Scientific Measurements. Uncertainty in Measurements All measurements have uncertainty. 1.Measurements involve estimation by the person making the.
1 LECTURE 6 Process Measurement Business Process Improvement 2010.
The success or failure of an investigation usually depends on the design of the experiment. Prepared by Odyssa NRM Molo.
© Copyright Pearson Prentice Hall Measurements and Their Uncertainty > Slide 1 of Using and Expressing Measurements A ___________________ is a quantity.
Research Seminars in IT in Education (MIT6003) Research Methodology I Dr Jacky Pow.
Chapter 7 The Logic Of Sampling. Observation and Sampling Polls and other forms of social research rest on observations. The task of researchers is.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
1.State your research hypothesis in the form of a relation between two variables. 2. Find a statistic to summarize your sample data and convert the above.
Copy these terms into your exercise book with the correct definition Accuracy Precision (of data) Reliability Results are close to the true value Repeat.
Reliability, Validity, and Bias. Reliability Reliability Reliability is the extent to which an experiment, test, or any measuring procedure yields the.
Sources of Errors M&E Capacity Strengthening Workshop, Addis Ababa 4 to 8 June 2012 Arif Rashid, TOPS.
Confidence intervals. Estimation and uncertainty Theoretical distributions require input parameters. For example, the weight of male students in NUS follows.
Chapter 10 Sampling: Theories, Designs and Plans.
© Copyright Pearson Prentice Hall Slide 1 of Measurements and Their Uncertainty On January 4, 2004, the Mars Exploration Rover Spirit landed on.
Scientific Measurement Measurements and their Uncertainty Dr. Yager Chapter 3.1.
Slide 1 of 48 Measurements and Their Uncertainty
MAT 1000 Mathematics in Today's World. Last Time 1.Collecting data with experiments 2.Practical problems with experiments.
Raymond Martin Lecture 6 – Measurement Data are: – pieces of observable information –limited by measurement Measurement is: –limiting.
Slide 1 of 48 Measurements and Their Uncertainty
Slide 1 of 48 Measurements and Their Uncertainty
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design & Measurement Scaling
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
1 Chapter 13 Collecting the Data: Field Procedures and Nonsampling Error © 2005 Thomson/South-Western.
1 Probability and Statistics Confidence Intervals.
Measurements and Data. Topics Types of Data Distance Measurement Data Transformation Forms of Data Data Quality.
Let’s Get Personal What do you think of when you think of the fall season?
Stat 100 Mar. 27. Work to Do Read Ch. 3 and Ch. 4.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Levey Jennings Chart Activity Staff Meeting Topic.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
WELCOME TO BIOSTATISTICS! WELCOME TO BIOSTATISTICS! Course content.
And distribution of sample means
Statistics Introduction.
Data Analysis.
T-test Tests the differences in the means between two groups
Combining Effect Sizes
Reliability, Validity, and Bias
Lecture 1: Descriptive Statistics and Exploratory
Statistical Thinking and Applications
Field procedures and non-sampling errors
Presentation transcript:

These slides are additional material for TIES4451 Data Mining: Data Lecture 3 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö

These slides are additional material for TIES4452 Data quality l GIGO – Garbage In, Garbage Out –Effectiveness of DM exercise depends on the quality of data l Data quality concerns –individual measurements (records and fields) –collections of observations l Sources of error are infinite –Human error (e.g., keyboard error) –Instrumentation failure  Inaccuare or imprecise –Inadequate specification of measurement or data collection process

These slides are additional material for TIES4453 Quality of individual measurements l Bias –the difference between the mean of the repeated measurements and the true value l Precision –variability of the repeated measurements (NOTE: precision is not the number of digits in record) l Accuracy –small bias and high precision (e.g., small variance) –e.g, repeated measurement of someone’s height may be precise (reliable), but inaccurate (validity), if (s)he is wearing shoes (we are not measuring the right thing) l True value (does it even exist?)

These slides are additional material for TIES4454 Quality of collections of data : bias l Distorted (biased) samples –mismatch between the sample population and and the population of interest (selection bias)  e.g., calculating an average age of students in Jyväskylä when the sample is restricted to female students –a sample may be selected through a chain of selection steps  e.g., candidates for bank loans: 1) potential customers are contacted, 2) some reply, some do not, 3) of those who replied some are creditworthy, some are not, 4) those who take out a loan are followed, 5) some are good customers, some are not,… –populations are not static (population drift)  e.g., customers shopping behaviour may change over time l A biased sample leads to inconsistent estimates of population parameters

These slides are additional material for TIES4455 Quality of collections of data: Incomplete data l Incomplete data: missing or empty values –Missing value: Information is not collected  e.g., People decline to answer a question (age, weight, position,…) – Empty value: Information does not exist  A form may have conditional parts: e.g., expiry date of an driver’s license can not be filled out by children –Determining whether any value is ”empty” or ”missing” requires domain knowledge  If the discriminating information is not provided both empty and missing values are treated as ”and called” missing –Fundamental question for data mining task: ”Why are the data incomplete?” –Note: A distorted (biased) sample is actually a special case of incomplete data