Nemours Biomedical Research Statistics March 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility
Nemours Biomedical Research Overview Class goals –Master basic statistical concepts –Learn analytic techniques & when to apply them –Learn how to interpret analysis results –Develop familiarity with R and related tools –Gain understanding that will transfer to a broad range of other statistics tool
Nemours Biomedical Research Overview Class structure –8 sessions –1.5 hours per session –Several homework assignments Class website –
Nemours Biomedical Research R Installing –Download from Class website Pick right (Mac versus Windows) version –Run installer program Go with all the defaults Running R –Live Demonstration
Nemours Biomedical Research R Concepts Command line similar to –Windows Command Shell –Mac Terminal –Unix/Linux Shell Uses ‘>’ as prompt Accepts –Constants (e.g., , etc.) –Variables (e.g., Height, Weight, SubjID,…) –Operations (e.g., ‘+’ ‘-’ ‘*’ ‘/’ ‘^’ ‘>’ ‘<‘ ‘==‘ ‘<-’) –Functions (e.g., sum(c(1,2,3)), mean(c(1,2,3))…)
Nemours Biomedical Research R Concepts Variable types –Scalar (a = 128) –Vector (a = c(3, 2, 9, 5)) –Matrix (dim(a) = c(2,2)) –Data Frame - collection of vectors. Similar to spread sheet ‘rows’ are indexed ‘cols’ named and indexed E.g., df$a is the column of data frame df named ‘a’ and df$a[5] is the 5th element of ‘a’.
Nemours Biomedical Research Statistics Science of data collection, summarization, analysis and interpretation. Descriptive versus Inferential Statistics: –Descriptive Statistic: Data description (summarization) such as center, variability and shape for quantitative variable (e.g. age) and number (frequency) and percentage for categorical variable (e.g. gender, race etc). –Inferential Statistic : Drawing conclusion beyond the sample studied, allowing for prediction.
Nemours Biomedical Research Statistical Description of Data Statistics describes a numeric set of data by its Center (mean, median, mode etc) Variability (standard deviation, range etc) Shape (skewness, kurtosis etc) Statistics describes a categorical set of data by Frequency, percentage or proportion of each category
Nemours Biomedical Research Statistical Inference Statistical inference is the process by which we acquire information about populations from samples. Two types of estimates for making inferences: –Point estimation. –Interval estimate. samplepopulation
Nemours Biomedical Research Population and Sample Population: The entire collection of individuals or measurements about which information is desired. Sample: A subset of the population selected for study. –Primary objective is to create a subset of population whose center, spread and shape are as close as that of population.
Nemours Biomedical Research Parameter v.s. Statistic Parameter: –Any statistical characteristic of a population. –Population mean, population median, population standard deviation are examples of parameters. –Parameter describes the distribution of a population –Parameters are fixed and usually unknown
Nemours Biomedical Research Parameter v.s. Statistic Statistic : –Any statistical characteristic of a sample. –Sample mean, sample median, sample standard deviation are some examples of statistics. –Statistic describes the distribution of population –Value of a statistic is known and is varies for different samples –Are used for making inference on parameter
Nemours Biomedical Research Parameter v.s. Statistic Statistical Issue –Estimate a population parameter using a sample statistic. –E.g., the sample mean is an estimate of the population mean.