Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Similar presentations


Presentation on theme: "Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization."— Presentation transcript:

1 Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization

2 Association Rule An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database. Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items. The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.

3 Support The support of an item set S is the percentage of those transactions in T which contain S. If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.

4 Confidence Confidence of a candidate rule X Y is calculated as support(XY) / support(X). The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y

5 Example: Association Rule In a store we might have I={cheese,ham,bread,butter,salt,coke} A transaction could look like: t={bread,butter} for a customer who bought cheese and coke. An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.

6 Apriori Algorithm Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets. Use the frequent itemsets to generate the desired rules.

7 Apriori Algorithm(cont’d) Pass 1 1.Generate the candidate itemsets in C 1 2.Save the frequent itemsets in L 1 Pass k 1.Generate the candidate itemsets in Ck from the frequent itemsets in L k-1 2.Join L k-1 with L k-1, as follows: insert into C k select p.item 1, q.item 1,..., p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 = q.item 1,..., p.item k-1 < q.item k-1

8 Apriori Algorithm(cont’d) 3. Generate all (k-1)-subsets from the candidate itemsets in C k 4. Prune all candidate itemsets from C k where some (k-1)-subset of the candidate itemset is not in the frequent itemset L k-1 2. Scan the transaction database to determine the support for each candidate itemset in C k 3. Save the frequent itemsets in L k

9 Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)

10 - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports- NBA Stats, Bibliography-Digital Libraries)

11 Intelligent Tools for E-Business Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems Learning Algorithms, Heuristic Searching Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery Prediction & Time Series Analysis Information Retrieval, Intelligent User Interface Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

12 Enhancing E-Business Process Through Data Mining Quality of discovered knowledge –Having right data –Having appropriate data mining tools!!! Traditional Data Mining Tools –Simple query and reporting –Visualization driven data exploration tools, OLAP –Discovery process is user driven

13 Intelligent Data Mining Tools Automate the process of discovering patterns/knowledge in data Require hypothesis, exploration Derive business knowledge (patterns) from data Combine business knowledge of users with results of discovery algorithms

14 Intelligent Information Agents The Data Mining Problem: –Clustering/ Classification –Association –Sequencing Viewed as an Optimization Problem Tools: Genetic Algorithms

15 Fuzzy Rules Discovering Rules discovering : The discovery of associations between business events, i.e. which items are purchased together In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data


Download ppt "Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization."

Similar presentations


Ads by Google