Biostatistics, statistical software I. Basic statistical concepts Krisztina Boda PhD Department of Medical Informatics, University of Szeged.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Introduction to Summary Statistics
Introduction to Data Analysis
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Calculating & Reporting Healthcare Statistics
Chapter 3 Describing Data Using Numerical Measures
The goal of data analysis is to gain information from the data. Exploratory data analysis: set of methods to display and summarize the data. Data on just.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Math 116 Chapter 12.
Describing Data: Numerical
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
Statistics.
What is biostatisics? Basic statistical concepts
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Methods for Describing Sets of Data
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Describing distributions with numbers
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Describing and Displaying Quantitative data. Summarizing continuous data Displaying continuous data Within-subject variability Presentation.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Exploratory Data Analysis
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Chapter 5 STATISTICS (PART 1).
Description of Data (Summary and Variability measures)
Basic Statistical Terms
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Chapter 1 Warm Up .
Welcome!.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Math 341 January 24, 2007.
Biostatistics Lecture (2).
Presentation transcript:

Biostatistics, statistical software I. Basic statistical concepts Krisztina Boda PhD Department of Medical Informatics, University of Szeged

Krisztina Boda INTERREG 2 What is biostatisics? Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Biostatistics or biometry is the application of statistics to a wide range of topics in biology. It has particular applications to medicine and to agriculture.

Krisztina Boda INTERREG 3 Application of biostatistics Research Design and analysis of clinical trials in medicine Public health, including epidemiology,

Krisztina Boda INTERREG 4

Krisztina Boda INTERREG 5 Biostatistical methods Descriptive statistics Hypothesis tests (statistical tests)  They depend on: the type of data the nature of the problem the statistical model

Krisztina Boda INTERREG 6 The data set A data set contains information on a number of individuals. Individuals are objects described by a set of data, they may be people, animals or things. For each individual, the data give values for one or more variables. A variable describes some characteristic of an individual, such as person's age, height, gender or salary.

Krisztina Boda INTERREG 7 The data-table Data of one experimental unit (“individual”) must be in one record (row) Data of the answers to the same question (variables) must be in the same field of the record (column) NumberSEX AGE

Krisztina Boda INTERREG 8 Variables Categorical (discrete) A discrete random variable X has finite number of possible values  Gender  Blood group  Number of children  … Continuous A continuous random variable X has takes all values in an interval of numbers.  Concentration  Temperature  …

Krisztina Boda INTERREG 9 Types of data from two aspects Based on the number of values they can have  discrete (categorical)  Continuous Based on the property they represent  Qualitative data nominal data (they can be distinguished by their names ) ordinal data (there are categories of classification it may be possible to order)  Quantitative (or numerical) data Example  Sex, blood-group, number of children  Age, temperature, concentration Example  Qualitative data Sex, blood-group very good-good – acceptable- wrong - very wrong-very wrong, low - normal - high,  Quantitative (or numerical) data Age, number of children

Krisztina Boda INTERREG 10 Distribution of variables Discrete: the distribution of a categorical variable describes what values it takes and how often it takes these values. Continuous: the distribution of a continuous variable describes what values it takes and how often these values fall into an interval.

Krisztina Boda INTERREG 11 The distribution of a continuous variable, example Values:Categories: Frequencies

Krisztina Boda INTERREG 12 The length of the intervals (or the number of intervals) affect a histogram

Krisztina Boda INTERREG 13 The overall pattern of a distribution The center, spread and shape describe the overall pattern of a distribution. Some distributions have simple shape, such as symmetric and skewed. Not all distributions have a simple overall shape, especially when there are few observations. A distribution is skewed to the right if the right side of the histogram extends much farther out then the left side.

Krisztina Boda INTERREG 14 Histogram of body weights (kg)

Krisztina Boda INTERREG 15 Outliers Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them (real data, typing mistake or other).

Krisztina Boda INTERREG 16 Describing distributions with numbers Measures of central tendency: the mean, the mode and the median are three commonly used measures of the center. Measures of variability : the range, the quartiles, the variance, the standard deviation are the most commonly used measures of variability. Measures of an individual: rank, z score

Krisztina Boda INTERREG 17 Measures of the center Mean: Mode: is the most frequent number Median: is the value that half the members of the sample fall below and half above. In other words, it is the middle number when the sample elements are written in numerical order Example: 1,2,4,1 Mean Mode Median

Krisztina Boda INTERREG 18 Measures of the center Mean: Mode: is the most frequent number Median: is the value that half the members of the sample fall below and half above. In other words, it is the middle number when the sample elements are written in numerical order Example: 1,2,4,1 Mean=8/4=2 Mode=1 Median  First sort data  Then find the element(s) in the middle If the sample size is odd, the unique middle element is the median If the sample size is even, the median is the average of the two central elements Median=1.5

Krisztina Boda INTERREG 19 Example The grades of a test written by 11 students were the following: A student indicated that the class average was 47, which he felt was rather low. The professor stated that nevertheless there were more 100s than any other grade. The department head said that the middle grade was 60, which was not unusual. The mean is 517/11=47, the mode is 100, the median is 60.

Krisztina Boda INTERREG 20 Relationships among the mean(m), the median(M) and the mode(Mo) A symmetric curve A curve skewed to the right A curve skewed to the left m=M=Mo Mo<M< m M < M < Mo

Krisztina Boda INTERREG 21 Measures of variability (dispersion) The range is the difference between the largest number (maximum) and the smallest number (minimum). Percentiles (5%-95%): 5% percentile is the value below which 5% of the cases fall. Quartiles: 25%, 50%, 75% percentiles The variance= The standard deviation:

Krisztina Boda INTERREG 22 Example Data: , in ascending order: Range: max-min=4-1=3 Quartiles: Standard deviation: 11-2= = =24 Total06

Krisztina Boda INTERREG 23 The meaning of the standard deviation A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations. For example, if the mean age is 45, with a standard deviation of 10, 95% of the cases would be between 25 and 65 in a normal distribution.

Krisztina Boda INTERREG 24 The use of sample characteristics in summary tables CenterDispersionPublish MeanStandard deviation, Standard error Mean (SD) Mean  SD Mean  SE Mean  SEM MedianMin, max 5%, 95%s percentile 25 %, 75% (quartiles) Med (min, max) Med(25%, 75%)

Krisztina Boda INTERREG 25 Displaying data Categorical data  bar chart  pie chart Continuous data  histogram  box-whisker plot  mean-standard deviation plot  scatter plot

Krisztina Boda INTERREG 26 Distribution of body weights The distribution is skewed in case of girls 1. Leíró statisztika boys girls

Krisztina Boda INTERREG 27

Krisztina Boda INTERREG 28 Mean-dispersion diagrams  Mean + SD  Mean + SE  Mean + 95% CI Mean  SE Mean  SDMean  95% CI

Krisztina Boda INTERREG 29 Box diagram A box plot, sometimes called a box-and-whisker plot displays the median, quartiles, and minimum and maximum observations.

Krisztina Boda INTERREG 30 Transformations of data values Addition, subtraction Adding (or subtracting) the same number to each data value in a variable shifts each measures of center by the amount added (subtracted). Adding (or subtracting) the same number to each data value in a variable does not change measures of dispersion.

Krisztina Boda INTERREG 31 Transformations of data values Multiplication, division Measures of center and spread change in predictable ways when we multiply or divide each data value by the same number. Multiplying (or dividing) each data value by the same number multiplies (or divides) all measures of center or spread by that value.

Krisztina Boda INTERREG 32 Proof. The effect of linear transformations Let the transformation be x ->ax+b Mean: Standard deviation:

Krisztina Boda INTERREG 33 Example: the effect of transformations Sample data (x i ) Addition (x i +10) Subtraction (x i -10) Multiplication (x i *10) Division (x i /10) Mean= Median= Range= St.dev.≈1.414≈1.414 ≈ 14.14≈

Krisztina Boda INTERREG 34 Special transformation: standardisation The z score measures how many standard deviations a sample element is from the mean. A formula for finding the z score corresponding to a particular sample element x i is, i=1,2,...,n. We standardize by subtracting the mean and dividing by the standard deviation. The resulting variables (z-scores) will have  Zero mean  Unit standard deviation  No unit

Krisztina Boda INTERREG 35 Example: standardisation Sample data (x i ) Standardised data (z i ) Mean20 St. deviation ≈

Krisztina Boda INTERREG 36 Review questions and exercises Problems to be solved by hand- calculations ..\Handouts\Problems hand I.doc..\Handouts\Problems hand I.doc Solutions ..\Handouts\Problems hand I solutions.doc..\Handouts\Problems hand I solutions.doc Problems to be solved using computer..\Handouts\Problems comp I.doc..\Handouts\Problems comp I.doc

Krisztina Boda INTERREG 37 Useful WEB pages     2/index.html 2/index.html