Applied statistics Usman Roshan.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Hypothesis Testing IV Chi Square.
Making Inferences for Associations Between Categorical Variables: Chi Square Chapter 12 Reading Assignment pp ; 485.
Genome-wide association studies BNFO 601 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick.
T-test.
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
STATISTICAL INFERENCE PART VII
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 11 Inferences About Population Variances n Inference about a Population Variance n.
Usman Roshan Machine Learning, CS 698
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.
Psychology 202a Advanced Psychological Statistics September 29, 2015.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chapter 13 Understanding research results: statistical inference.
Chapter 10 Section 5 Chi-squared Test for a Variance or Standard Deviation.
Testing of Hypothesis. Test of Significance A procedure to assess the significance of a statistics that a sample statistics would differ from a given.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Basic statistics Usman Roshan.
Applied statistics Usman Roshan.
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Chi-square test.
Statistical Inferences for Population Variances
Inference concerning two population variances
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Probability and Statistics
Usman Roshan CS 675 Machine Learning
Parallel chi-square test
Probability Theory and Parameter Estimation I
Basic statistics Usman Roshan.
Chapter 4. Inference about Process Quality
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
The binomial applied: absolute and relative risks, chi-square
Ch3: Model Building through Regression
Usman Roshan Machine Learning
Chapter 12 Tests with Qualitative Data
Active Learning Lecture Slides
John Loucks St. Edward’s University . SLIDES . BY.
Introduction to Inferential Statistics
Data Analysis for Two-Way Tables
STAT120C: Final Review.
Feature selection Usman Roshan.
Chapter 9 Hypothesis Testing.
Chapter 11 Inferences About Population Variances
Statistical Inference about Regression
Chapter 10 Analyzing the Association Between Categorical Variables
Statistical Analysis Chi-Square.
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Homogeneity of Variance
Usman Roshan Machine Learning
Statistical Inference for the Mean: t-test
Quadrat sampling & the Chi-squared test
Quadrat sampling & the Chi-squared test
Lecture 46 Section 14.5 Wed, Apr 13, 2005
Applied Statistical and Optimization Models
Presentation transcript:

Applied statistics Usman Roshan

A few basic stats Expected value of a random variable – example of Bernoulli and Binomal Variance of a random variable example of Bernoulli and Binomial Correlation coefficient (same as Pearson correlation coefficient) Formulas: Covariance(X,Y) = E((X-μX)(Y-μY)) Correlation(X,Y)= Covariance(X,Y)/σXσY Pearson correlation

Correlation between variables Measures the correlation between two variables The correlation r is between -1 and 1. A value of 1 means perfect positive correlation and -1 in the other direction The function f(r) has a t-distribution with n-2 df that can be used to obtain a p-value

Pearson correlation coefficient From Wikipedia

Basic stats in Python Mean and variance calculation Correlations Define list and compute mean and variance Correlations Define two lists and compute correlation

Statistical inference P-value Bayes rule Posterior probability and likelihood Bayesian decision theory Bayesian inference under Gaussian distribution Chi-square test, Pearson correlation coefficient, t-test

P-values What is a p-value? It is the probability of your estimate assuming the data is coming from some null distribution For example if your estimate of mean is 1 and the true mean is 0 and is normally distributed what is the p-value of your estimate? It is the area under curve of the normal distribution for all values of mean that are at least your estimate A small p-value means the probability that the data came from the null distribution is small and thus the null distribution could be rejected. A large p-value supports the null distribution but may also support other distributions

P-values from Gaussian distributions Courtesy of Wikipedia

P-values from chi-square distributions Courtesy of Wikipedia

Type 1 and type 2 errors Courtesy of WIkipedia

Bayes rule Fundamental to statistical inference Conditional probability Posterior = (Likelihood * Prior) / Normalization

Hypothesis testing We can use Bayes rule to help make decisions An outcome or action is described by a model Given two models we pick the one with the higher probability Coin toss example: use likelihood to determine which coin generated the tosses

Likelihood example Consider a set of coin tosses produced by a coin with P(H)=p (P(T)=1-p) We are given some tosses (training data): HTHHHTHHHTHTH. Was the above sequence produced by a fair coin? What is the probability that a fair coin produced the above sequence of tosses? What is the p-value of your sequence of tosses assuming the coin is fair? This is the same as asking what is the probability that a fair coin generates 9 or more heads out of 13 heads. Let’s start with exactly. Solve it with R. Was the above sequence more likely to be produced by a biased coin 1 (p=0.85) or a biased coin 2 (p=.75)? Solution: Calculate the likelihood (probability) of the data with each coin Alternatively we can ask which coin maximizes the likelihood?

Chi-square test Contingency table We have two random variables: Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). We look up chi-square value in distribution with degrees of freedom = (cols-1)*(rows-1) to get p- value Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Observed=c4 Expected=X4 Observed=c3 Expected=X3 Label=1 Observed=c2 Expected=X2 Observed=c1 Expected=X1 Label=0 Feature=B Feature=A