Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.

Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong

Agenda AnnouncementOverview Microsoft Mining Model Algorithms Lucky Draw!!!

Announcement Learn Microsoft Technologies and Win Some Prize! To make it easier for you to learn Microsoft technologies, we have changed the way to deliver seminar contents by offering you Offline Webcast CDs. 3 CDs in 6 months – 3 topics and assessment3 CDs in 6 months – 3 topics and assessment If you can pass the assessment criteria, you will receive a $150 Park’n Shop cash coupon!If you can pass the assessment criteria, you will receive a $150 Park’n Shop cash coupon! Since this is a trial offer, the maximum number of participants will be limited to 50 (on first-come-first-serve basis). Register now by sending email to Microsoft Macau Team at macaugp@microsoft.com! macaugp@microsoft.com

Data Mining Overview Microsoft Data Mining Algorithms

Microsoft Mining Model Algorithms Decision Trees Naive Bayes Cluster Analysis Sequence Clustering Association Rules Time Series Neural Networks

Decision Trees Classify each case to one of a few discrete broad categories of selected attributes The process of building is recursive partitioning – splitting data into partitions and then splitting it up more Initially all cases are in one big box

Decision Trees The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable Several measures of purity Then it repeats splitting for each new class Again testing all possible breaks Unuseful branches of the tree can be pre-pruned or post-pruned

Decision Trees Decision trees are used for classification and prediction Typical questions: Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers like to buy?

Microsoft Mining Models

Naïve Bayes Classification and Prediction Model Calculates probabilities for each possible state of the input attribute given each state of the predictable attribute

Naïve Bayes Used for classification Assign new cases to predefined classes Some typical questions: Categorize bank loan applications Determining which home telephone lines are used for Internet access Assigning customers to predefined segments Quickly gathering basic comprehension

Cluster Analysis Grouping data into clusters Objects within a cluster have high similarity based on the attribute values The class label of each object is not known Several techniques Partitioning methods Hierarchical methods Density based methods Model based methods, more…

Cluster Analysis Segments a heterogeneous population into a number of more homogenous subgroups or clusters Some typical questions: Discover distinct groups of customers Identification of groups of houses in a city In biology, derive animal and plant taxonomies

Sequence Clustering Analyzes sequence-oriented data that contains discrete-valued series The sequence attribute in the series holds a set of events with a specific order that can be cosnsidered as a model Typically used for Web customer analysis Can be used for any other sequential data

Sequence Clustering UserSequence 1 frontpage news travel travel 2 news news news news news 3 frontpage news frontpage news frontpage 4 news news 5 frontpage news news travel travel travel 6 news weather weather weather weather 7 news health health business business business 8 frontpage sports sports sports weather 9weather Click-Stream Analysis

Association Rules For market basket analyses Identify cross-selling opportunities Arrange attractive packages Considers each attribute/value pair as an item An item set is a combination of items in a single transaction The algorithm scans through the dataset trying to find item sets that tend to appear in many transactions

Association Rules – Support Support is the percentage of rows containing the item combination compared to the total number of rows: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels The support for the rule “If a customer purchases Cola, then they will purchase Frozen Pizza” is 40%

Association Rules – Confidence What if 100% of customers buy milk and only 20% of those buy potato chips? The confidence of an association rule is the support for the combination divided by the support for the condition This gives a confidence for a rule “If a customer purchases Milk, they will purchase Potato Chips” of (20% / 60%) = 33%

Time Series Predict continuous columns, such as product sales or stock performance in a forecasting scenario Builds a model in two stages First stage creates a list of optimal candidate input columns Second stage investigates each candidate input column and determines if it improves the model

Neural Network Data modeling tool that is able to capture and represent complex input/output relationships Neural networks resemble the human brain in the following two ways: A neural network acquires knowledge through learning A neural network's knowledge is stored within inter- neuron connection strengths known as synaptic weights It explores all possible data relationships It is slow

Back-Propagation Training a neural network is setting the best weights on the inputs of each of the units The back-propagation process: Get a training example and calculate outputs Calculate the error – the difference between the calculated and the expected (known) result Adjust the weights to minimize the error

Conclusion: When To Use What Analytical problem ExamplesAlgorithms Classification: Assign cases to predefined classes Credit risk analysis Churn analysis Customer retention Decision Trees Naive Bayes Neural Nets Segmentation: Taxonomy for grouping similar cases Customer profile analysis Mailing campaign Clustering Sequence Clustering Association: Advanced counting for correlations Market basket analysis Advanced data exploration Decision Trees Association Time Series Forecasting: Predict the future Forecast sales Predict stock prices Time Series Prediction: Predict a value for a new case based on values for similar cases Quote insurance rates Predict customer income All Deviation analysis: Discover how a case or segment differs from others Credit card fraud detection Network infusion analysis All

Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.

Similar presentations

Presentation on theme: "Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.

Similar presentations

Presentation on theme: "Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback