COSC 6335 Fall 2014 Post Analysis Project1

Slides:



Advertisements
Similar presentations
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Notes on Residuals Simple Linear Regression Models.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Regionalized Variables take on values according to spatial location. Given: Where: A “structural” coarse scale forcing or trend A random” Local spatial.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlational Designs
Critical Analysis. Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data.
Basic Data Analysis for Quantitative Research
Correlation.
Ch. Eick Christoph F. Eick. Ch. Eick Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Christoph F. Eick: ML Project Post-Analysis 1 Project2 Post Analysis —General Things Reviewing is about voicing your opinion about the paper! Reviews.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Chapter 6: Analyzing and Interpreting Quantitative Data
Chapter 8: Simple Linear Regression Yang Zhenlin.
Ch. Eick: Some Ideas for Task4 Project2 Ideas on Creating Summaries and Evaluations of Clusterings Focus: Primary Focus Summarization (what kind of objects.
ANOVA, Regression and Multiple Regression March
Ch. Eick: Some Ideas for Task4 Project2 Ideas on Creating Summaries that Characterize Clustering Results Focus: Primary Focus Cluster Summarization (what.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Quantitative Methods in the Behavioral Sciences PSY 302
Stats Methods at IC Lecture 3: Regression.
Howard Community College
Lecture Slides Elementary Statistics Twelfth Edition
Statistics for Political Science Levin and Fox Chapter 11:
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
Inferences for Regression
Teaching Statistics in Psychology
Analyzing and Interpreting Quantitative Data
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Elementary Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Day 13 Agenda: DG minutes.
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Regression model Y represents a value of the response variable.
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Residuals Learning Target:
CHAPTER 26: Inference for Regression
Correlation and Simple Linear Regression
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
Example Histogram c) Interpret the following histogram that captures the percentage of body-fat in a testgroup [4]:  
COSC 4335: Other Classification Techniques
Review Homework.
Review Homework.
MIS2502: Data Analytics Clustering and Segmentation
How where first 3 displays generated?
MIS2502: Data Analytics Clustering and Segmentation
Simple Linear Regression and Correlation
Review Homework.
On Interpreting I Interpreting Histograms, Density Functions, distributions of a single attribute What is the type of the attribute? What is the mean.
Regression Assumptions
CHAPTER 1 Exploring Data
Inferences for Regression
Algebra Review The equation of a straight line y = mx + b
Regression Assumptions
MGS 3100 Business Analysis Regression Feb 18, 2016
Using the Rule Normal Quantile Plots
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

COSC 6335 Fall 2014 Post Analysis Project1 Christoph F. Eick Arko Barman

Post Analysis Project1 Disclaimer The main purpose of these slides is not criticize groups but rather to learn how to do a better job when analyzing data and interpreting data mining results. Most of you do not have much experience in these tasks Learning without making errors is impossible; therefore, students can benefit from discussing errors of other students Visualization Use large, high resolution displays—some students used displays that did not reveal much because of too high density. Be careful when you plot points of different attributes with different colors! 2 groups made plots with random color assignments to points If you compare displays, put them next to each other!! Use the same coordinate systems/scale in displays you compare

Post Analysis Project1 Part2 Interpretation Scatterplot: the key question is if the attribute/pair of attributes can provide some evidence for the dominance of a particular class in a particular region in the attribute space; not if the attribute pair clearly separates the classes. Vague interpretation of quantitative results; e.g. “Att1 seems to be more important that Att2” versus “the fact the regression coefficient of Att1 is 12 times as large as the regression coefficient of Att2 suggest that attribute Att1 has a much stronger impact on class membership”. Overlooking patterns in displays; e.g. regions that are dominated by one class or only looking for pattern in E/W direction when there are also clear patterns in N/S direction. Not giving summaries at all or giving very “quick” summaries

Some Displays Group F

Regression Results You needed to scale the data!!! Only one group provided weird results. Class=-0.405322* variance - 0.459553*skewness - 0.437963*curtosis - 0.001676*entropy + 0.444606 (Group G) R-squared of the regression function is 0.8648525, which tell us the function fits the data well. You needed to mention Importance of attributes Role of the sign of coefficients How you normalized (there are more than one way)

Box Plots Thanks to Group B!

Decision Trees Mention how you divide the data into test and training sets Bonus for cross-validation or trying out more than one way <=10 nodes! One group had more than 20 nodes!!! Analyze importance of attributes! Group C

Conclusion (Q.10 + 11) An excellent idea to remove entropy and plot the rest of the points! (Group A)

Post Analysis Project1 Part3 Statistical Summaries If there are minor disagreement I took away 1 point If the results do not make any sense, I took away a lot of points Importance of Attributes Variance likely to be the most important in classifying Entropy does not have much impact Curtosis and skewness somewhere in between

Post Analysis Project1 Part4 Linear Regression If you do not scale data, interpretation of the observed coefficients is quite complicated Lack of quantitative assessment of results Star Plots What is in your opinion the usefulness of this techniques? I myself have difficulties making sense of those, but some of you do seem to like Star Plots much more... Conclusion/Other Findings Half of the groups of quite short conclusions and most summaries are somewhat vague; e.g. they do not write about The importance/usefulness of the attributes The usefulness of the employed techniques …