Download presentation
Presentation is loading. Please wait.
Published byStanley Parker Modified over 9 years ago
1
Stat 565- Lecture 0 Introduction and Map of this Class
2
What are we trying to do: The main purpose of this class is to get: – Statisticians a flair for the specific issues in genomic data – Biologist an idea about how/why we analyze data the way we do – Computer scientist an idea about the biological (very briefly) and statistical issues with this data. Bottom Line: is to allow three groups of people to talk to each other.
3
My role versus your role in the class My role: to pose the problems and try to teach the logic behind the statistical procedures.
4
Your role: – If you are a Statistician: talk to your classmates about the Statistics. Some of the Statistics will be very familiar to you but NOT to your class mates. – If you are a biologist: to explain to me and the class what some of the issues are. Specific problems like hybridization, PCR and ideas that you are familiar with but not all your class-mates – If you are a computer scientist: to explain what some of the challenges are computationally and how we address them.
5
Statistical terms and topics we will use Descriptive Statistics Hypothesis Testing Multivariate Statistics Non-parametric Statistics Bayesian Statistics Design of Experiments: Optimal Design
6
Descriptive Statistics Terms we will see and use: – Histograms and Shapes – Boxplots – Scatter Plots – Mean, Median – Standard Deviation, Quartiles, Quantiles – Coefficient of Variation – Distribution plots: normal qq plots etc.
7
Hypothesis Testing Hypothesis Type I and Type II errors T-test F-test for ANOVA Chi-squares P-values Multiplicity (simultaneous testing/multiple comparison) – Error control, family-wise error rates, FWER, Bonferroni, FDR – Single step vs sequential methods
8
Multivariate Statistics EDA: Exploratory Data Analysis Cluster Analysis (hierarchical, non-hierarchical, distance metrics, types of clustering) Principal Components (Idea behind this) Discriminant Analysis Supervised vs Unsupervised Learning
9
Non-parametric Statistics How do non-parametric tests work in general Sign Test Wilcoxon Signed Rank Test Wilcoxon Rank Sum test (Mann Whitney Test) Tukey’s biweight algorithm Kruskal Wallis test
10
Bayesian Statistics How these work Empirical Bayes Methods Moderated t or F test
11
Design of Experiments Why design? Block designs Criteria for determining optimality Dye-swaps, block designs, loop designs
12
Structure of the Class We will use the basic definition of Statistics to define the structure of the class: Statistics comprises of methods for collecting, compiling, describing, analyzing and inferring from data.
13
Our steps We talk about the experiment that generates this data Specific nuances to the data collection, design issues, systematic effects Leads us to Normalization and issues therein Type of Data – description and compilation Analyze data for overall effects (Clustering etc) Inferring from data (hypothesis testing) Overall process
14
We will use R R is free-ware and we can access it readily. I will use this mainly in class. http://cran.r-project.org/ The version I have is 3.1.2 Choose your computer and operating system and Download and Install R. The Binary Versions are the fastest. Also Install the packages as many as you can. It will ask for a CRAN site or mirror that’s close to you. I always use USA(WA) as my CRAN site. Will give you SAS code as well if you are interested.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.