Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.

Slides:



Advertisements
Similar presentations
Simple Linear Regression Chapter 6 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.
Advertisements

Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Statistics for the Social Sciences
Multiple Regression MARE 250 Dr. Jason Turner.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise.
Correlational Designs
Correlation and Regression Analysis
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Aim: How do we use SPSS to create and interpret scatterplots? SPSS Assignment 1 Due Friday 2/12.
T-tests and ANOVA Statistical analysis of group differences.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Critical Analysis. Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data.
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Is there a relationship between the lengths of body parts ?
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10 Correlation and Regression
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Christoph F. Eick: ML Project Post-Analysis 1 Project2 Post Analysis —General Things Reviewing is about voicing your opinion about the paper! Reviews.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Chelsie Guild, Taylor Larsen, Mary Magee, David Smith, Curtis Wilcox TERM PROJECT- VISUAL PRESENTATION.
Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise.
4.2 Correlation The Correlation Coefficient r Properties of r 1.
Statistical Analysis Topic – Math skills requirements.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
Chapter 8: Simple Linear Regression Yang Zhenlin.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Ch. Eick: Some Ideas for Task4 Project2 Ideas on Creating Summaries and Evaluations of Clusterings Focus: Primary Focus Summarization (what kind of objects.
ANOVA, Regression and Multiple Regression March
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
Ch. Eick: Some Ideas for Task4 Project2 Ideas on Creating Summaries that Characterize Clustering Results Focus: Primary Focus Cluster Summarization (what.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Scatterplots and Correlation Section 3.1 Part 2 of 2 Reference Text: The Practice of Statistics, Fourth Edition. Starnes, Yates, Moore.
Ch. Eick Project 2 COSC Christoph F. Eick.
REGRESSION REVISITED. PATTERNS IN SCATTER PLOTS OR LINE GRAPHS Pattern Pattern Strength Strength Regression Line Regression Line Linear Linear y = mx.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Is there a relationship between the lengths of body parts?
Elementary Statistics
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
LECTURE 13 Thursday, 8th October
Correlation and Simple Linear Regression
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Correlation and Simple Linear Regression
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
Example Histogram c) Interpret the following histogram that captures the percentage of body-fat in a testgroup [4]:  
How where first 3 displays generated?
Simple Linear Regression and Correlation
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
Topic 8 Correlation and Regression Analysis
Regression and Categorical Predictors
COSC 6335 Fall 2014 Post Analysis Project1
Solution to Problem 2.25 DS-203 Fall 2007.
Presentation transcript:

Ch. Eick Christoph F. Eick

Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how to do a better job when analyzing data and interpreting data mining results. Most of you do not have much experience in these tasks Learning without making errors is impossible; therefore, students can benefit from discussing errors of other students Visualization Use large, high resolution displays—some students used displays that did not reveal much because of too high density. Quality of the visualization impacts what you are able to see If you compare displays, put them next to each other!! Use the same coordinate systems/scale in displays you compare 2

Ch. Eick Post Analysis Project1 Part2 Interpretation Scatterplot: the key question is if the attribute/pair of attributes can provide some evidence for the dominance of a particular class in a particular region in the attribute space; not if the attribute pair clearly separates the classes. Vague interpretation of quantitative results; e.g. “Att1 seems to be more important that Att2” versus “the fact the regression coefficient of Att1 is 12 times as large as the regression coefficient of Att2 suggest that attribute Att1 has a much stronger impact on class membership”. Overlooking patterns in displays; e.g. regions that are dominated by one class or only looking for pattern in E/W direction when there are also clear patterns in N/S direction. Not giving summaries at all or giving very “quick” summaries 3

Ch. Eick Some Displays 4

Ch. Eick Discuss Scatter Plots generated by Group 8 5

Ch. Eick Regression Results No Scaling: R 2 :Multiple R-squared: 0.286Adjusted R-squared: Coefficients: (Intercept) V2 V3 V6 V With Scaling: 6 GlucoseConcBloodPBMIPedigree Coefficients Intercept scale(GlucoseConc ) scale(BloodP)scale(BMI)scale(Pedigree) Mean Value The fact that the R 2 is 0.28 suggests that the results a suggestive but do not Indicate a strong finding about the importance of the attributes.

Ch. Eick Box Plots 7 Thanks to Group 10!

Ch. Eick Post Analysis Project1 Part3 Statistical Summaries If there are minor disagreement I took away 1 point If the results do not make any sense, I took away a lot of points (only happened once) If it was not clear how the results were generated (no R-code or incomplete R-code or lack of explanation), I also took away points. Other You were also supposed to interpret the histograms, but the project specification failed to ask you to do that!  discuss another example inReview2 Importance of Attributes GC is definitely very helpful for diagnosing diabetes (scatter plot, regression); e.g. if it is quite low, it is very unlikely that the person has diabetes (  useful for diabetes test) BMI (boxplot, scatterplot, regression coefficients) and to a lesser extend Pedigree have some usefulness in diagnosing diabetes. No evidence has been suggested by any group that DBP has any usefulness in diagnosing diabetes, although it has a week positive correlation of 0.28 with BMI 8

Ch. Eick Post Analysis Project1 Part4 Linear Regression If you do not scale data, interpretation of the observed coefficients is quite complicated (see previous slide). Lack of quantitative assessment of results Star Plots What is in your opinion the usefulness of this techniques? I myself have difficulties making sense of those, but some of you do seem to like Star Plots much more... Conclusion/Other Findings Half of the groups of quite short conclusions and most summaries are somewhat vague; e.g. they do not write about The importance/usefulness of the attributes The usefulness of the employed techniques Knowledge about diabetes generated in Project1 … Project Weights Fall 2013 Project2>Project3??>Project4  Project1 9