Presentation is loading. Please wait.

Presentation is loading. Please wait.

COSC 6335 Fall 2014 Post Analysis Project1

Similar presentations


Presentation on theme: "COSC 6335 Fall 2014 Post Analysis Project1"— Presentation transcript:

1 COSC 6335 Fall 2014 Post Analysis Project1
Christoph F. Eick Arko Barman

2 Post Analysis Project1 Disclaimer
The main purpose of these slides is not criticize groups but rather to learn how to do a better job when analyzing data and interpreting data mining results. Most of you do not have much experience in these tasks Learning without making errors is impossible; therefore, students can benefit from discussing errors of other students Visualization Use large, high resolution displays—some students used displays that did not reveal much because of too high density. Be careful when you plot points of different attributes with different colors! 2 groups made plots with random color assignments to points If you compare displays, put them next to each other!! Use the same coordinate systems/scale in displays you compare

3 Post Analysis Project1 Part2
Interpretation Scatterplot: the key question is if the attribute/pair of attributes can provide some evidence for the dominance of a particular class in a particular region in the attribute space; not if the attribute pair clearly separates the classes. Vague interpretation of quantitative results; e.g. “Att1 seems to be more important that Att2” versus “the fact the regression coefficient of Att1 is 12 times as large as the regression coefficient of Att2 suggest that attribute Att1 has a much stronger impact on class membership”. Overlooking patterns in displays; e.g. regions that are dominated by one class or only looking for pattern in E/W direction when there are also clear patterns in N/S direction. Not giving summaries at all or giving very “quick” summaries

4 Some Displays Group F

5 Regression Results You needed to scale the data!!! Only one group provided weird results. Class= * variance *skewness *curtosis *entropy (Group G) R-squared of the regression function is , which tell us the function fits the data well. You needed to mention Importance of attributes Role of the sign of coefficients How you normalized (there are more than one way)

6 Box Plots Thanks to Group B!

7 Decision Trees Mention how you divide the data into test and training sets Bonus for cross-validation or trying out more than one way <=10 nodes! One group had more than 20 nodes!!! Analyze importance of attributes! Group C

8 Conclusion (Q ) An excellent idea to remove entropy and plot the rest of the points! (Group A)

9 Post Analysis Project1 Part3
Statistical Summaries If there are minor disagreement I took away 1 point If the results do not make any sense, I took away a lot of points Importance of Attributes Variance likely to be the most important in classifying Entropy does not have much impact Curtosis and skewness somewhere in between

10 Post Analysis Project1 Part4
Linear Regression If you do not scale data, interpretation of the observed coefficients is quite complicated Lack of quantitative assessment of results Star Plots What is in your opinion the usefulness of this techniques? I myself have difficulties making sense of those, but some of you do seem to like Star Plots much more... Conclusion/Other Findings Half of the groups of quite short conclusions and most summaries are somewhat vague; e.g. they do not write about The importance/usefulness of the attributes The usefulness of the employed techniques


Download ppt "COSC 6335 Fall 2014 Post Analysis Project1"

Similar presentations


Ads by Google