Author(s): Alex Ocampo, Chong Zhang, Sangwon Hyun, Yiqun Hu, 2012 License: Unless otherwise noted, this material is made available under the terms of the.

Slides:



Advertisements
Similar presentations
Author(s): Don M. Blumenthal, 2010 License: Unless otherwise noted, this material is made available under the terms of the Attribution – Non-commercial.
Advertisements

Author(s): Michael Hortsch, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): Michael Hortsch, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): Vic Divecha, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-Non-commercial-Share.
Author(s): John Doe, MD; Jane Doe, PhD, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): John Doe, MD; Jane Doe, PhD, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Templates for editing U-M OER Materials
Author(s): Paul Conway, License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): Seetha Monrad, M.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Module: Public Health Disaster Planning for Districts Organization: East Africa HEALTH Alliance, Author(s): Dr. Roy William Mayega (Makerere.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author: Michael Jibson, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Share.
We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation.
Author(s): MELO 3D Project Team, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Joan Durrance, 2009 License: Unless otherwise noted, this material is made available under the terms of the Attribution - Non-commercial 3.0.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Gerald Abrams, M.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): August E. Evrard, PhD License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-Non-commercial-Share.
Author: Michael Jibson, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Share.
Author(s): Kate Saylor, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author: John Williams, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Gerald Abrams, M.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): August E. Evrard, PhD License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-Non-commercial-Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): August E. Evrard, PhD License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-Non-commercial-Share.
Author: Michael Jibson, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Share.
Author(s): Michael Hortsch, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author: Michael Jibson, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Share.
Author: Michael Jibson, M.D., Ph.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Share.
Author(s): Beata M. Canby, David Hutchful, Pieter Kleymeer, Brandon Ngo, 2007 License: Unless otherwise noted, this material is made available under the.
Author(s): MELO 3D Project Team, 2011 License: This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a.
Author(s): Steve Jackson, 2009 License: Unless otherwise noted, this material is made available under the terms of the Attribution - Noncommercial - Share.
Author(s): Vic Divecha, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-Non-commercial-Share.
Author(s): Lisa McLaughlin, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution-ShareAlike.
Author(s): Gabriel Krieshok, Alex Pompe, 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons.
Author(s): Gerald Abrams, M.D., 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share.
Author(s): Louis D’Alecy, 2009
Author(s): Paul Conway, PhD, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial.
Author(s): Brenda Gunderson, Ph.D., 2011
Author(s): Rahul Sami and Paul Resnick, 2009
Author(s): Paul Resnick, PhD, 2011
1 Author(s): Andrew Rosenberg
Author: Michael Jibson, M.D., Ph.D., 2009
Author(s): Rajesh Mangrulkar, MD, 2009
Author(s): Paul Conway, PhD, 2010
Author: Michael Jibson, M.D., Ph.D., 2009
Attribution: University of Michigan Medical School, Department of Internal Medicine License: Unless otherwise noted, this material is made available under.
1 Author(s): Rebecca W. Van Dyke, M.D., 2012
Author(s): Joan Durrance, 2009
1 Author(s): Rebecca W. Van Dyke, M.D., 2012
Attribution: University of Michigan Medical School, Department of Microbiology and Immunology License: Unless otherwise noted, this material is made available.
Author(s): August E. Evrard, PhD
Attribution: Department of Neurology, 2009
Author: Michael Jibson, M.D., Ph.D., 2009
Author(s): Rahul Sami and Paul Resnick, 2009
Presentation transcript:

Author(s): Alex Ocampo, Chong Zhang, Sangwon Hyun, Yiqun Hu, 2012 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution – Noncommercial – Share Alike 3.0 Lic ense: We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your abilit y to use, share, and adapt it. The citation key on the following slide provides information about how you may sha re and adapt this material. Copyright holders of content included in this material should contact with any questi ons, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit

Attribution Key for more information see: Use + Share + Adapt Make Your Own Assessment Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Creative Commons – Zero Waiver Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105 ) Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your j urisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that y our use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. { Content the copyright holder, author, or law permits you to use, share and adapt. } { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } { Content Open.Michigan has used under a Fair Use determination. }

Descriptive Statistics quantitatively describe the main features of a collection of data. My salary is $45,000. It’s a middle salary in my company Staff. Jones Benefits are highly related to working age What should I make of all this???!!! How do salaries vary across the company? HR manager employee

Mean> mean(x); > mean(x,trim=a) Median> median(x) Mode> sort(table(x)) Standard deviation> sd(x) Variance > var(x) the median absolute deviation > mad(c(x)) interquartile range> IQR(x) Range> range(x) Descriptive Statistics in R

Data Dimensions > length(x) [1] > nrow(X) [1] 2030 > ncol(X) [1] > dim(X) [1] Matrix X ….

Vectorization in R Matrix X > apply( X, MARGIN=1, FUN= mean) > apply( X, MARGIN=2, FUN= mean)

boxplot(X) Good for small data sets Easy to compar e groups side b y side 1.5*IQR defines outlier

The Big Six Minimum, 1 st Q, Median, Mean, 3 rd Q, Maximu m > summary(X)

R tries to understand you > summary(X)

Histograms: > hist(X)

Correlation > cor(wt,mpg) [1] > plot(x=wt,y=mpg)

Scatterplot Matrix Iris dataset 150 flowers 5 variables Goingslo, flickrflickr

Scatterplot Matrix > pairs(data)

> coplot(lat ~ long | depth)

Linear Regression Why? What?  Prediction of future or unknown observations  Assessment of relationship between variables  General description of data structure

Variable Selection Why?  Simplification  Elimination of multicollinearity and noise  Time and money saving How?  Testing-based Variable Selection Methods - Backward, Forward, Stepwise  Criterion-based Procedures What?  AIC = n ln(RSS/n) + 2(p)

Example: U.S. State Fact and Figures Life Expectancy  Population, Income, Illiteracy, Murder, HS Grad, Frost, Area > g <- lm(Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area, data = statedata) > summary(g) Selected R code  Linear Regression  AIC > step(g) > anova(g) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.094e e < 2e-16 *** Population 5.180e e Income e e Illiteracy 3.382e e Murder e e e-08 *** HS.Grad 4.893e e * Frost e e Area e e Analysis of Variance Table Response: Life.Exp Df Sum Sq Mean Sq F value Pr(>F) Population Income e-05 *** Illiteracy e-07 *** Murder e-08 *** HS.Grad ** Frost Area Residuals AIC = n ln(RSS/n) + 2(p)

Continued: U.S. State Fact and Figures Start: AIC= Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area Df Sum of Sq RSS AIC - Area Income Illiteracy <none> Population Frost HS.Grad Murder Step: AIC= Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost Df Sum of Sq RSS AIC - Illiteracy Income <none> Population Frost HS.Grad Murder

Step: AIC= Life.Exp ~ Population + Murder + HS.Grad + Frost Df Sum of Sq RSS AIC <none> Population Frost HS.Grad Murder Coefficients: (Intercept) Population Murder HS.Grad Frost 7.103e e e e e-03 Effect on Response Variable of One Unit Change of Predict Variable

What is Principal Component Analysis (PCA)? Two general approaches of reducing variables : feature selection and feature extraction  Feature Selection : “Akaike Information Criterion”(AIC), BIC or Back-Substitution  Feature extraction : “Principal Component Analysis”(PCA) is most widely used  Create several artificial variables  Built-in functions in R = Convenient!

Actual Pima Data pregnantglucosediastolictricepsinsulinbmidiabetesagetest …. ( Imagine a data set with many more (~1000) columns ) (Imagine a Linear Regression: Which variables affect diabetes in what ways?)

PCA Example: Pima Indians The National Institute of Diabetes and Digestive and Kidney Diseases conducte d a study on 768 adult female Pima Indians living near Phoenix. 9 Variables (8 continuous, 1 categorical)  pregnant: Number of times pregnant  Glucose : Plasma glucose concentration at 2 hours in an oral glucose tolerance test  Diastolic : Diastolic blood pressure (mm Hg)  Triceps : Triceps skin fold thickness (mm)  Insulin : 2-Hour serum insulin (mu U/ml)  Bmi : Body mass index (weight in kg/(height in metres squared))  Diabetes : Diabetes pedigree function  Age : Age (years)  Test : diabetes (coded 0 if negative, 1 if positive) Next Slide: PCA Implementation

What principal components might look like: PC1 : 1*Insulin *Glucose +.. PC2 : 1*Glucose *Age *DiastolicBP +.. PC3 : 0.92 * DiastolicBP *Triceps  Principal components : What are they composed of? (less important)  Difference with Linear Regression

- Goal: obtain summary about data in lower dimensions - - How many dimensions? - R code in the next slide:

Brief : R-Code > data.pca <- prcomp(data[,-9]); summary(data.pca); Importance of components: PC1 PC2 PC3 PC4 PC5 PC6 PC7 Standard deviation Proportion of Variance Cumulative Proportion > data.pca Rotation: PC1 PC2 PC3 PC4 PC5 PC6 PC7 pregnant e e+00 glucose e e-04 Diastolic e e-03 triceps e e-04 insulin e e-03 bmi e e-03 age e e-01 > barplot(totalrep, main="Representation of Principal Components", xlab="Principal Component", ylab="% of Total Variance") > biplot(data.pca, xlabs=rep('+',768), xlim = c(-0.05,0.3), ylim = c(-0.15,0.12)); abline(h=0,v=0);