Introduction and descriptive statistics 30th August 2006 Tron Anders Moger.

Slides:



Advertisements
Similar presentations
IB Math Studies – Topic 6 Statistics.
Advertisements

SPSS Session 1: Levels of Measurement and Frequency Distributions
Statistical Tests Karen H. Hagglund, M.S.
Introduction to Biostatistics. Biostatistics The application of statistics to a wide range of topics in biology including medicine.statisticsbiology.
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.
QM Spring 2002 Business Statistics SPSS: A Summary & Review.
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Chapter 19 Data Analysis Overview
Data observation and Descriptive Statistics
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Statistical Techniques in Hospital Management QUA 537
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Ana Jerončić, PhD Department for Research in Biomedicine and Health.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Chapter 1 Review MDM 4U Mr. Lieff. 1.1 Displaying Data Visually Types of data Quantitative Discrete – only whole numbers are possible Continuous – decimals/fractions.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
Practice 1 Tao Yuchun Medical Statistics
Introduction and descriptive statistics 29th August 2007 Tron Anders Moger.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Data: Presentation and Description Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Analyses using SPSS version 19
Perform Descriptive Statistics Section 6. Descriptive Statistics Descriptive statistics describe the status of variables. How you describe the status.
SPSS Instructions for Introduction to Biostatistics Larry Winner Department of Statistics University of Florida.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Simple linear regression Tron Anders Moger
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Statistical Analysis using SPSS Dr.Shaikh Shaffi Ahamed Asst. Professor Dept. of Family & Community Medicine.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
MDM4U Displaying Data Visually Learning goal:Classify data by type Create appropriate graphs.
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Elementary Analysis Richard LeGates URBS 492. Univariate Analysis Distributions –SPSS Command Statistics | Summarize | Frequencies Presents label, total.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Why do we analyze data?  To determine the extent to which the hypothesized relationship does or does not exist.  You need to find both the central tendency.
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Exercise 1 Content –Covers chapters 1-4 Chapter 1 (read) Chapter 2 (important for the exercise, 2.6 comes later) Chapter 3 (especially 3.1, 3.2, 3.5) Chapter.
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Measurements Statistics
Introduction to SPSS July 28, :00-4:00 pm 112A Stright Hall
Analysis and Empirical Results
Doc.RNDr.Iveta Bedáňová, Ph.D.
Dr. Siti Nor Binti Yaacob
Basic Statistical Terms
Treat everyone with sincerity,
Statistical Analysis using SPSS
Program This course will be dived into 3 parts: Part 1 Descriptive statistics and introduction to continuous outcome variables Part 2 Continuous outcome.
15.1 The Role of Statistics in the Research Process
Displaying Data – Charts & Graphs
Data, Tables and Graphs Presentation.
Exercise 1: Entering data into SPSS
Presentation transcript:

Introduction and descriptive statistics 30th August 2006 Tron Anders Moger

New England Journal of Medicine, Editorial, Jan. 6, 2000, p The eleven most important developments in medicine in the past millennium –Elucidation of human anatomy and physiology –Discovery of cells and their substructures –Elucidation of the chemistry of life –Application of statistics to medicine –Development of anesthesia –Discovery of the relation of microbes to disease –Elucidation of inheritance and genetics –Knowledge of the immune system –Development of body imaging –Discovery of antimicrobial agents –Development of molecular pharmacotherapy

Introduction A lot of knowledge appear through numbers and quantitative data. Problems in interpreting statistical results are often underestimated. Important to learn “numerical literacy” – the ability to understand numbers and quantitative relationships.

Number of births in former East Germany

Mortality in Tanzania and Norway

Research and numbers Numbers often appear in medical research. The numbers are often uncertain, they have variability They must be organized in order to interpret them Wish to generalize the results to the general population

Statistical data Appear from: Numerical measurements with an instrument on a continuous scale (Continuous data). Examples: – Fever: 39.6 (Unproblematic) – IQ: 116 (Problematic) Categorization (categorical data). Examples: –Man / woman (Unproblematic) –Depressed / Not depressed (Problematic)

Reliability: Precision of data? How much will they differ if the measurements are repeated? Validity: Do we capture what we are really interested in? Is the measurement relevant? Variability in the data

Reliability of lung function measurements 6 repeated measurements on 12 students.

Reliability of questionnaire/interview Alcohol use (men years): –Mean number of times alcohol users say that they have felt intoxicated: 1993 (questionnaire): 14.1 times per year 1994 (interview): 7.3 times per year In 1994 they used the word drunk.

Reliability of clinical study Sackett et al: Clinical Epidemiology (Little, Brown and Company, 1985). Pictures of the eye of 100 patients are studied by two clinicians to see if there is evidence of retinopathy Second clinician No Yes First No: clinician Yes: Observed agreement: (46+32)/100 =78%

Sources of variation in data Laboratory variation Observer variation Instrument variation Measurement variation Biological variation between individuals Day to day variation within the same individual/hospital

Generalization Sample: The units, experiments, individuals etc. that are in the study E.g.: –15 patients with migraine –Neurophysiological study on rats Population: The collection of units etc. one wishes the results to apply for –All patients with migraine –All repetitions of the neurophysiological experiment

Pairs of terms Sample –Histogram –Mean –Proportion –Measurements of cholesterol level –Weather Population –Probability distribution –Expectation –Risk –Cholesterol level in the population −Climate

Types of data: Continuous data. Data measured on a continuous scale, e.g. height, weight, age. Can be truly continuous (with decimals) or discrete (integers) Categorical data. Data in categories, e.g. gender, education level, grouped age, hospital department. Can be nominal or ordinal.

Data in SPSS (and other statistical software): IMPORTANT: One line in the data file always correspond to one observation! Common to have an id variable for each observation If a measurement is missing, leave the cell empty To create a new variable in SPSS, choose Data->Insert variable in the Data View window, or by writing the variable name in Name in the Variable View window

Data coding: The value of the variable for continuous data For categorical data, define a suitable coding, e.g. 0=male and 1=female, or 0=grammar schoole, 1=high school and 2=college/university degree In Variable View, the definition of the coding can be defined in Values In Label you can write further information about the variable

Descriptive statistics Tables Graphs, plots Measures of central tendency Measures of variability

Types of graphs Histogram Box-plot Scatter plot Line plot Bar plot

The age of 100 medical students

How can you get an overview of these data in SPSS? Explore! Choose Analyze - Descriptive Statistics - Explore. Select the relevant variables by clicking them, and transferring them to Dependent List. Choose Plots, remove the check on “Stem and leaf” and check “Histogram” instead. Click Continue and OK.

Histogram: The distribution of age among the students (n=100)

Box-plot: The distribution of age among the students

Measures of central tendency Mean The students: 22.2 years Median The middle observation when the observations are arranged in increasing order The students: 22.0 years The mean is influenced by extreme observation. The median is robust

Measures of variability Standard deviation The students: 3.06 years Coefficient of variation: s/ *100% The students: 13.8% Quartiles: Arrange the data in increasing order. The 25% quartile is at the observation where 25% of the observations have lower values, and 75% of the observations have higher values. (In SPSS: Check Percentiles in the Statistics meny in Explore) The students: 25% quartile: 20.0 years 75% quartile: 23.0 years

How to get separate plots for each category of a categorical variable, e.g. gender Click Analyze - Descriptive Statistics - Explore. Move the continuous variable to Dependent List. Move gender to Factor List That’s it!

Separate boxplots for each gender

Relationship between two continuous variables: Scatter plot! Choose Graphs - Scatter - Define. Choose a variable for the Y-axis and one for the X-axis Separate markers for separate groups is achieved by transferring the categorical variable to Set Markers by Can also include regression lines by choosing “Fit line at total”, or a line for each category by choosing “Fit line at subgroups”.

Scatter plot, weight versus height for the students

Scatter plot, weight versus height, with regression lines Will talk much more about regression later

Correlation coefficient A numerical measure of the relationship between two continuous variables x and y Range between -1 and 1 Values close to 0: No relationship Values close to 1 or -1: Almost linear relationship

Descriptive statistics for categorical variables Not very useful to calculate the mean for e.g. educational level Would like to find the percentages within each category in the study Analyze->Descriptive Statistics ->Frequencies Move the variable to Variables(s)

Frequency table Last column shows the cumulative distribution; always sums up to 100%

Simple bar plot

Relationships between categorical variables Choose Analyze->Descriptive Statistics ->Crosstabs Move one variable to Rows, and another to Columns Click Cells, and check relevant percentages (Rows, Columns or Total)

Crosstable: Relationship between race and smoking

Bar plot: Relationship between race and smoking

Line plot for ordinal categorical variables (time-series plot)

Conclusion Tons of different options on how to present results You will (hopefully) learn to understand which option is most relevant for each problem during this course