Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.

Similar presentations


Presentation on theme: "© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev."— Presentation transcript:

1 © 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev

2 © 2002 by Prentice Hall 2 Data Mining (continued)

3 © 2002 by Prentice Hall 3 arff files @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no

4 © 2002 by Prentice Hall 4 Predictive models Inputs (e.g., medical history, age) Output (e.g., will patient experience any side effects) Some models are better than others

5 © 2002 by Prentice Hall 5 Operating curves optimal practical random success failure most likelyleast likely

6 © 2002 by Prentice Hall 6 Principles of data mining Training/test sets Error analysis and overfitting Cross-validation Supervised vs. unsupervised methods error input size training test

7 © 2002 by Prentice Hall 7 Representing data Vector space salary credit pay off default

8 © 2002 by Prentice Hall 8 Decision surfaces salary credit pay off default

9 © 2002 by Prentice Hall 9 Decision trees salary credit pay off default

10 © 2002 by Prentice Hall 10 Linear boundary salary credit pay off default

11 © 2002 by Prentice Hall 11 kNN models Assign each element to the closest cluster Demos: –http://www- 2.cs.cmu.edu/~zhuxj/courseproject /knndemo/KNN.html

12 © 2002 by Prentice Hall 12 Other methods Decision trees Neural networks Support vector machines Demos –http://www.cs.technion.ac.il/~rani/ LocBoost/

13 © 2002 by Prentice Hall 13 arff files @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no

14 © 2002 by Prentice Hall 14 Weka http://www.cs.waikato.ac.nz/ml/weka Methods: rules.ZeroR bayes.NaiveBayes trees.j48.J48 lazy.IBk trees.DecisionStump

15 © 2002 by Prentice Hall 15 kMeans clustering http://www.cs.mcgill.ca/~bonnef/project.h tml http://www.cs.washington.edu/research/im agedatabase/demo/kmcluster/ http://www- 2.cs.cmu.edu/~dellaert/software/ java weka.clusterers.SimpleKMeans -t data/weather.arff

16 © 2002 by Prentice Hall 16 More useful pointers http://www.kdnuggets.com/ http://www.twocrows.com/booklet.ht m


Download ppt "© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev."

Similar presentations


Ads by Google