Biostatistics-Lecture 2 Ruibin Xi Peking University School of Mathematical Sciences.

Slides:



Advertisements
Similar presentations
Biostatistics-Lecture 4 More about hypothesis testing Ruibin Xi Peking University School of Mathematical Sciences.
Advertisements

Random Variables ECE460 Spring, 2012.
Lecture (7) Random Variables and Distribution Functions.
A.P. STATISTICS LESSON 7 – 1 ( DAY 1 ) DISCRETE AND CONTINUOUS RANDOM VARIABLES.
Discrete Probability Distributions
Probability & Statistical Inference Lecture 3
Review of Basic Probability and Statistics
Probability Densities
Introduction to Probability and Statistics
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Probability Distributions Finite Random Variables.
Chapter 5: Probability Concepts
Probability Distributions
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Probability Distributions Random Variables: Finite and Continuous Distribution Functions Expected value April 3 – 10, 2003.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Week71 Discrete Random Variables A random variable (r.v.) assigns a numerical value to the outcomes in the sample space of a random phenomenon. A discrete.
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD.
1 If we can reduce our desire, then all worries that bother us will disappear.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Biostatistics Lecture 7 4/7/2015. Chapter 7 Theoretical Probability Distributions.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Applied Business Forecasting and Regression Analysis Review lecture 2 Randomness and Probability.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Biostatistics Class 3 Discrete Probability Distributions 2/8/2000.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
1 Since everything is a reflection of our minds, everything can be changed by our minds.
Math b (Discrete) Random Variables, Binomial Distribution.
Introduction to Statistics Alastair Kerr, PhD. Think about these statements (discuss at end) Paraphrased from real conversations: – “We used a t-test.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Random Variables Learn how to characterize the pattern of the distribution of values that a random variable may have, and how to use the pattern to find.
Random Variables Example:
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Chapter 6: Continuous Probability Distributions A visual comparison.
MATH Section 3.1.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
Computing Fundamentals 2 Lecture 7 Statistics, Random Variables, Expected Value. Lecturer: Patrick Browne
Chapter 6: Continuous Probability Distributions A visual comparison.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Biostatistics-Lecture 2 Ruibin Xi Peking University School of Mathematical Sciences.
Discrete Random Variables Section 6.1. Objectives Distinguish between discrete and continuous random variables Identify discrete probability distributions.
IEE 380 Review.
BIOS 501 Lecture 3 Binomial and Normal Distribution
AP Statistics: Chapter 7
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Experiments, Outcomes, Events and Random Variables: A Revisit
Discrete Random Variables: Basics
Introductory Statistics
Presentation transcript:

Biostatistics-Lecture 2 Ruibin Xi Peking University School of Mathematical Sciences

Data Exploration—categorical variable (1) Single Nucleotide Polymorphism

Data Exploration—categorical variable (2) Zhao and Boerwinkle (2002) studied the pattern of SNPs Zhao and Boerwinkle (2002) Collected all available SNPs in NCBI through 2001 Look at the distribution of the different SNPs Why much more transitions? Bar Graph

Data Exploration—categorical variable (2) Zhao and Boerwinkle (2002) studied the pattern of SNPs Zhao and Boerwinkle (2002) Collected all available SNPs of human genome in NCBI through 2001 Look at the distribution of the different SNPs Why much more transitions? Pie Graph

Data exploration—quantitative variable (1) Fisher’s Iris data E. S. Anderson measured flowers of Iris Variables – Sepal ( 萼片 ) length – Sepal width – Petal ( 花瓣 ) length – Petal width Iris

Data exploration—quantitative variable (2) Histogram ( 直方图 ) Unimode distribution bimode distribution What is the possible reason for the two peaks?

Data exploration—quantitative variable (2) Scatter plot ( 散点图 ) Cluster 1 Cluster 2

Data exploration—quantitative (3) variable (2) In fact, there are three species of iris – Setosa, versicolor and virginica

Summary statistics Sample mean Sample median Sample variance Sample standard deviation

Quantiles Median: the smallest value that greater than or equal to at least half of the values qth quantile: the smallest value that greater than or equal to at least 100q% of the values 1 st quantile Q1: the 25% quantile 3 rd quantile Q3: the 75% quantile Interquantile range (IQR): Q3-Q1

Boxplot IQR 1.5IQR

Data exploration—quantitative (4) Boxplot

Relationships between categorical variable (1) A study randomly assigned physicians to case (11037) or control (11034) group. In the control group – 189 (p1=1.71%) had heart attack In the case group – 104 (p2=0.94%) had heart attach

Relationships between categorical variables (2) Relative risk Odds ratio – Sample Odds – Odds ratio

Relationships between categorical variable (3) Contingency table Does taking aspirin really reduces heart attach risk? – P-value: 3.253e-07 (one sided Fisher’s test)

Probability Randomness – A phenomenon (or experiment) is called random if its outcome cannot be determined with certainty before it occurs – Coin tossing – Die rolling – Genotype of a baby

Some genetics terms (1) Gene: a segment of DNA sequence (can be transcribed to RNA and then translated to proteins) Allele: An alternative form of a gene Human genomes are diploid (two copies of each chromosome, except sex chromsome) Homozygous, heterozygous: two copies of a gene are the same or different

Some genetics terms (2) Genotype – In bi-allele case (A or a), 3 possible outcomes AA, Aa, aa Phenotype – Hair color, skin color, height – 小指甲两瓣(大槐树下先人后代?) Genotype is the genetic basis of phenotype Dominant, recessive Phenotype may also depend on environment factors

Probability Sample Space S: – The collection of all possible outcomes The sample space might contain infinite number of possible outcomes – Survival time (all positive real values)

Probability Probability: the proportion of times a given outcome will occur if we repeated an experiment or observation a large number of times Given outcomes A and B – – If A and B are disjoint –

Conditional probability In the die rolling case – E1 = {1,2,3}, E2 = {2,3} – P(E1|E2) = ?, P(E2|E1) = ? Assume

Law of Total Probability From the conditional probability formula In general we have

Independence If the outcome of one event does not change the probability of occurrence of the other event For two independent events

Bayesian rule

Random variables and their Distributions (1) Random variable X assigns a numerical value to each possible outcome of a random experiment – Mathematically, a mapping from the sample space to real numbers For the bi-allelic genotype case, random variables X and Y can be defined as

Random variables and their Distributions (2) Probability distribution – A probability distribution specifies the range of a random variable and its corresponding probability In the genotype example, the following is a probability distribution

Random variables and their Distributions (3) Discrete random variable – Only take discrete values – Finite or countable infinite possible values Continuous random variable – take values on intervals or union of intervals – uncountable number of possible values – Sepal/petal length, width in the iris data – Weight, height, BMI …

Distributions of discrete random variable (1) Probability mass function (pmf) specifies the probability P(X=x) of the discrete variable X taking one particular value x (in the range of X) In the genotype case Summation of the pmf is 1 Population mean, population variance

Distributions of discrete random variable (2) Bernoulli distribution The pmf

Distributions of discrete random variable (3) Binomial distribution – Summation of n independent Bernoulli random variables with the same parameter – Denoted by – The pmf

Distributions of discrete random variable (4) Poisson distribution – A distribution for counts (no upper limits) The pmf

Distributions of discrete random variable (4) Poisson distribution – A distribution for counts (no upper limits) The pmf Feller (1957) used for model the number of bomb hits in London during WWII – 576 areas of one quarter square kilometer each – λ=0.9323

Distributions of continuous variables (1) Use probability density function (pdf) to specify If the pdf is f, then Population mean, variance Cumulative distribution (cdf) pdf

Distributions of continuous variables (2) Normal distribution – The pdf The distribution of the BMI variable in the data set Pima.tr can be viewed as a normal distribution (MASS package)

Distributions of continuous variables (3) Student’s t-distribution (William Sealy Gosset) Chi-square distribution Gamma distribution F-distribution Beta distribution

Quantile-Quantile plot Quantile-Quantile (QQ) plot – Comparing two distributions by plotting the quantiles of one distribution against the quantiles of the other distribution – For goodness of fit checking