Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 15 Wrap up of class. What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array,

Similar presentations


Presentation on theme: "Lecture 15 Wrap up of class. What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array,"— Presentation transcript:

1 Lecture 15 Wrap up of class

2 What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array, proteomic, RNA- seq, GWAS Why and when does one use them?

3 Sources of Variation led us to our next topic Statistical issues concerning: a. Normalization of data b. Stochastic error versus systematic errors

4 Normalization VERY important we realize WHY we normalize data as opposed to HOW to normalize data. I am including Background correction along with Normalizing here. The pros and Cons of normalizing vs not. What theoretically Normalizing is supposed to do and WHAT it actually does.

5 Statistical topics: LOESS Quantile Normalization Tukey Bi-weight Wilcoxon Signed Rank test

6 Now come the QUESTIONS OF INTEREST: What are the genes that are different for the healthy versus diseased cells? –Gene discovery, differential expression Is a specified group of genes all up-regulated in a specified condition? –Gene set differential expression Did not get time for this too much but can be included in Clustering after DE

7 Tests we talked about: For 2 conditions: Pooled t test Welch’s t test Wilcoxon Rank Sum Test PermutationTest Bootstrap t test. EB Bayes Test

8 Announcement I am totally voice-less today So we will present as follows: – Andrew – Cameron – Lili – Huinan – Ben – Amit – Xin

9 Contd… David Chongjin Jie Jeff Miaoru Jillian Jeff

10 Tests contd For multiple Conditions ANOVA F test Kruskal Wallis Test EB Bayes Test

11 Multiplicity: The question of multiplicity adjustment, FWE, PCE or FDR? Bonferroni corrections, False Discovery Rates, FDR Sequential Bonferroni, the Holm adjustment Bootstrapping, Permutation adjustments

12 Class discovery, clustering To do clustering we need a distance metric and a linkage method. We can have hierarchical or non-hierarchical clustering. Non-hierarchical Clustering: Partitioning Methods (need to know number of clusters0 Hierarchical Clustering: Produces trees (produces tree-diagram)

13 Distance and Linkages Distance: Eucledean Manhattan Mahalanobis Correlation Linkages: Complete Singles Centroid Average

14 Class prediction, classification Are there tumour sub-types not previously identified? Do my genes group into previously undiscovered pathways?

15 LDA Feature Selection: gene filtering – Differential Expression – PCA – Penalized Least Square Choosing the rules – Parametric ones: Liklihood Linear Discriminant Rule Mahalanobis rule Posterior Probability Rule The General Classification Rule (using cost of mis-classification and priors)

16 Misclassifications – Non-parametric ones K-NN Estimating Misclassification rates – Resubstitution – Hold-out Samples – Cross validation/Jack-knife

17 This is just the beginning of this journey Remember you still have loads to learn You have to keep reading and be willing to incorporate new ideas Thanks a bunch for sharing this journey with me!


Download ppt "Lecture 15 Wrap up of class. What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array,"

Similar presentations


Ads by Google