Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Lecture 7: Principal component analysis (PCA)
Design of Engineering Experiments - Experiments with Random Factors
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #19 3/8/02 Taguchi’s Orthogonal Arrays.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Elementary hypothesis testing
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Elementary hypothesis testing
Resampling techniques
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Overview of Lecture Parametric Analysis is used for
Lecture 9: One Way ANOVA Between Subjects
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of scientific research When you know the system: Estimation.
Basics of discriminant analysis
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Linear and generalised linear models
Chapter 11: Inference for Distributions
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Maximum likelihood (ML)
Name: Garib Murshudov location: Bioscience Building (New Biology), K065 webpage for lecture notes and exercises
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Relationships Among Variables
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Objectives of Multiple Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
1 Advances in Statistics Or, what you might find if you picked up a current issue of a Biological Journal.
Chapter 11 Simple Regression
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Comparing Two Means Prof. Andy Field.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Research Project Statistical Analysis. What type of statistical analysis will I use to analyze my data? SEM (does not tell you level of significance)
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
ANOVA, Regression and Multiple Regression March
Some Alternative Approaches Two Samples. Outline Scales of measurement may narrow down our options, but the choice of final analysis is up to the researcher.
Sample Size Determination
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.
CHAPTER 29: Multiple Regression*
Confidence intervals for the difference between two means: Independent samples Section 10.1.
Presentation transcript:

Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation of parameters and feedback When the system is too complicated

Knowledge New system Model Experiment Estimate Verify Predict A simple diagram of scientific research: When you know the system Data analysis

Simple application of statistics 1.Using previously accumulated knowledge you want to study a system 2.Build a model of the system that is based on the previous knowledge 3.Set up an experiment and collect data 4.Estimate the parameters of the model and change the model if needed 5.Verify if parameters are correct and they describe the current model 6.Predict the behaviour of the experiment and set up a new experiment. If prediction gives good results then you have done a good job. If not then you need to reconsider your model and do everything again 7.Once you have done and satisfied then your data as well as model become part of world knowledge Data Analysis is used at the stage of estimation and verification

Simple application of statistics The result of the model is usually a function dependent on two types of variables. The first type is that can be varied and the second type you want to estimate: Where x is a variable you can control and  is a variable you want to estimate. As a result of experiment you get observations for y. Then using one of the techniques (e.g. Maximum likelihood, Bayesian statistics) you carry out the estimation. Prediction is carried out for values of x that you have not done experiment for. In real life problem is more complicated. In many cases controllable and observations are dictated by the nature of experiment. But model is something different that is dependent on the parameters you estimate using this experiment I.e. experiment gives: But you want (you still need parameters):

Simple application of statistics You have a model and the results of experiment. Then you carry out estimation of parameters (e.g. using simplest least-squares technique): This simple estimation uses assumptions: Errors in experiment are independent, all of them have 0 mean and exactly same variance. After carrying out estimation of the parameters the next stage is to find out how accurate are they. Once this stage is complete, you carry out prediction (can you predict a value of y at the point x where you have not done experiment). If prediction at this stage works then model is fine. You give your results to a scientific community.

When system is too complicated Sometimes the system you are trying to study is too complicated to build a model for. For example in psychology, biology the system is very complicated and there are no unifying model. Nonetheless you would like to understand the system or its parts. Then you use observations and build some sort of model and then check it against the (new) data. Schematic diagram: Data (Design)Model Estimate Verify Predict Data analysis is used in all stages

When system is too complicated Usually you start from the simplest models: linear models. If linear model does not fit then start complicating it. By linearity we mean linear on the parameters. This way of modeling could be good if you do not know anything and you want to build a model to understand the system.

When system is too complicated In many cases just simple linear model may not be sufficient. You need to analyse the data before you can build any sort of model. In these cases you want to find some sort of structure in the data. If you can find a structure in the data then it is very good idea to look at the subject where these data came from. We will learn some of the techniques that can give some idea about the structure of the data. Usual techniques include: Principal component analysis, correspondence analysis, factor analysis, discriminant analysis, metric scaling, clustering.

When system is too complicated When system is too complicated, instead of building the model that can answer to your all question you sometimes want to know answer to simple questions. E.g. if effect of two or more factors are significantly different. For example you may want to compare the effects of two different drugs or effects of two different treatments. We will have a lecture about ANOVA and how to analyse the results using R. ANOVA is useful when you want to compare the effects of more than two factors.

When system complicated: Various criteria Occam’s razor: “entities should not be multiplied beyond necessity” or “All things being equal, the simplest solution tends to be the right one” A potential problem: There might be conflict between simplicity and accuracy. You can build tree of models that would have different degree of simplicity at different levels Rashomon: Multiple choices of models When simplifying a model you may come up up with different simplifications that have similar prediction errors. In these cases, techniques like bagging (bootstrap aggregation) may be helpful

Some application of data analysis Simplest application of statistics is: You have a vector of observations and you want to know if the mean is equal to some pre-specified value (say zero). Then you calculate mean value and check against this value. It is done by simple t-test. t.test(data) This command will calculate for you the mean, variance of the data and then calculate the relevant statistics. It will also give you confidence intervals. If the confidence interval does not contain the value you want to test against (say zero) then you can say that according to these data with 95% confidence that mean is not equal to zero. More over if p value is very small then you can say with 100-p*100 percent confidence that the value is different from zero

Some application of data analysis Another very simple application of statistics is comparing means of two samples using t.test. Before doing this test it is a good idea to have a look a box plot and test if variances are equal var.test(data1,data2) If it can be assumed that variances are equal then you can use t.test(data1,data2,var.equal=1) If variances are not equal then use t.test(data1,data2,var.equal=1)

Some application of data analysis If you can influence the experiment then you should emphasise the importance of paired designs. If design is paired then many systematic differences due to some unknown factors may be avoided. It is done easily using t.test again t.test(data1,data2,paired=1)