Download presentation
Presentation is loading. Please wait.
Published byCalvin Richard Modified over 8 years ago
1
Lecture 15 Wrap up of class
2
What we intended to do and what we have done: Topics: What is the Biological Problem at hand? Types of data: micro-array, proteomic, RNA- seq, GWAS Why and when does one use them?
3
Sources of Variation led us to our next topic Statistical issues concerning: a. Normalization of data b. Stochastic error versus systematic errors
4
Normalization VERY important we realize WHY we normalize data as opposed to HOW to normalize data. I am including Background correction along with Normalizing here. The pros and Cons of normalizing vs not. What theoretically Normalizing is supposed to do and WHAT it actually does.
5
Statistical topics: LOESS Quantile Normalization Tukey Bi-weight Wilcoxon Signed Rank test
6
Now come the QUESTIONS OF INTEREST: What are the genes that are different for the healthy versus diseased cells? –Gene discovery, differential expression Is a specified group of genes all up-regulated in a specified condition? –Gene set differential expression Did not get time for this too much but can be included in Clustering after DE
7
Tests we talked about: For 2 conditions: Pooled t test Welch’s t test Wilcoxon Rank Sum Test PermutationTest Bootstrap t test. EB Bayes Test
8
Announcement I am totally voice-less today So we will present as follows: – Andrew – Cameron – Lili – Huinan – Ben – Amit – Xin
9
Contd… David Chongjin Jie Jeff Miaoru Jillian Jeff
10
Tests contd For multiple Conditions ANOVA F test Kruskal Wallis Test EB Bayes Test
11
Multiplicity: The question of multiplicity adjustment, FWE, PCE or FDR? Bonferroni corrections, False Discovery Rates, FDR Sequential Bonferroni, the Holm adjustment Bootstrapping, Permutation adjustments
12
Class discovery, clustering To do clustering we need a distance metric and a linkage method. We can have hierarchical or non-hierarchical clustering. Non-hierarchical Clustering: Partitioning Methods (need to know number of clusters0 Hierarchical Clustering: Produces trees (produces tree-diagram)
13
Distance and Linkages Distance: Eucledean Manhattan Mahalanobis Correlation Linkages: Complete Singles Centroid Average
14
Class prediction, classification Are there tumour sub-types not previously identified? Do my genes group into previously undiscovered pathways?
15
LDA Feature Selection: gene filtering – Differential Expression – PCA – Penalized Least Square Choosing the rules – Parametric ones: Liklihood Linear Discriminant Rule Mahalanobis rule Posterior Probability Rule The General Classification Rule (using cost of mis-classification and priors)
16
Misclassifications – Non-parametric ones K-NN Estimating Misclassification rates – Resubstitution – Hold-out Samples – Cross validation/Jack-knife
17
This is just the beginning of this journey Remember you still have loads to learn You have to keep reading and be willing to incorporate new ideas Thanks a bunch for sharing this journey with me!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.