Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.

Similar presentations


Presentation on theme: "By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain."— Presentation transcript:

1 By Dan Stalloch

2 Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain things occur Classification – shows us how data is grouped

3 Prediction – the detection of a stable occurrence within the data that may continue into the future Identification – what can be found out by system usage or what might be present in a thing Classification – how the data could be grouped Optimization – finding ways to utilize resources

4 Apriori – frequent large item sets Sampling – small frequent item sets Frequent-Pattern (FP) Tree and FP-Growth – better version of Apriori Partition – efficient way to use the Apriori algorithm Decision Tree Induction – constructing a decision tree from a training data set k-Means – creates clustering And others

5 Marketing – analyzing customer behavior Finance – keeping track of credit and fraud Manufacturing – optimizing use of resources Health Care – checking patterns for useful information

6 http://archive.ics.uci.edu/ml/machine-learning- databases/auto-mpg/auto-mpg.data http://archive.ics.uci.edu/ml/machine-learning- databases/auto-mpg/auto-mpg.data This is a Car database from a depository of databases made available to everyone through UCI When mining a database it is essential to ask what would you like to be able to predict from it and in this instance we would like to know which cars have decent mpg We might also be able to predict which companies are likely to stay in business

7 We must create or use programs that shows us either a 2-D contingency table or a 3-D contingency table http://www.autonl ab.org/tutorials/dt ree18.pdf

8 We use a formula to decide which areas have the highest information gain dependent on what we would like to know. That forumula goes like this IG(Y|X) = H(Y) - H(Y | X) Where H(X) = the entropy of X

9 http://www.autonlab.org/tutorials/dtree18.pdf http://archive.ics.uci.edu/ml/machine-learning- databases/auto-mpg/auto-mpg.data http://archive.ics.uci.edu/ml/machine-learning- databases/auto-mpg/auto-mpg.data http://www.autonlab.org/tutorials/infogain11.pdf Chapter 28 from Fundamentals of Database Systems 6 th Edition By Elmasri and Navathe Pictures from Andrew W. Moore Slides


Download ppt "By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain."

Similar presentations


Ads by Google