Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data e Web Mining Paolo Gobbo
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
Week 9 Data Mining System (Knowledge Data Discovery)
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Chapter 5: Data Mining for Business Intelligence
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Chapter 2: Association Rules & Sequential Patterns.
Chapter 2: Association Rules & Sequential Patterns.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
Data Mining By Dave Maung.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Association Rules & Sequential Patterns. CS583, Bing Liu, UIC 2 Road map Basic concepts of Association Rules Apriori algorithm Sequential pattern mining.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data mining in web applications
DATA MINING © Prentice Hall.
I don’t need a title slide for a lecture
Supporting End-User Access
Market Basket Analysis and Association Rules
Department of Computer Science National Tsing Hua University
Presentation transcript:

Chase Repp

 knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained within

 Data mining differs from database querying in the following manner: database querying asks “what company purchased $100,000 worth of widgets last year?” while this asks “what company is likely to purchase over $100,000 of widgets next year and why?”

 coined in the 1960s  Data mining was used to find basic information from the collections of data such as total revenue over the last three years.  classic statistics  artificial intelligence  machine learning

 Predictive Data Mining Target value Future trends  Descriptive Data Mining No target value Focuses on relations

 focuses on discovering a relationship between independent variables and a relationship between dependent and independent variables  used to forecast specific things

 describes a data set in a brief but comprehensive way and gives interesting characteristics of the data without having any predefined target  Focus on relations

 patterns are discovered based on a relationship of a specific item with other items in the same transaction  Descriptive  Example: groceries

 to classify each item in a set of data into one of the predefined sets of classes or groups  Often used with machine learning  Predictive  Example: cat or dog person?

 Different from classification, the clustering technique also defines the classes and put objects in them  Descriptive  Example: a library

 used to predict numbers from data sets that have known target values  Predictive  Example: sales, distance, temperature, value, etc

 discovers frequent sequences or subsequences as patterns in a sequence database  Descriptive  Derived from association mining

 There are three categories that the main sequential pattern mining techniques fall into.  Apriori-based  Pattern-growth  Early-pruning

 follow the apriori property - all nonempty subsets of a frequent itemset must also be frequent  if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset  AprioriAll, GSP, PSP, and SPAM

 Transaction data  Assume: minsup = 30% minconf = 80%  An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] about 43%  Association rules from the itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] …… Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3] t1:Beef, Chicken, Milk t2:Beef, Cheese t3:Cheese, Boots t4:Beef, Chicken, Cheese t5:Beef, Chicken, Clothes, Cheese, Milk t6:Chicken, Clothes, Milk t7:Chicken, Milk, Clothes

 Two steps: Find all itemsets that have minimum support (frequent itemsets). Use frequent itemsets to generate rules.  E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]

itemset:count 1. scan T  C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3  F 1 : {1}:2, {2}:3, {3}:3, {5}:3  C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} 2. scan T  C 2 : { 1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2  F 2 : { 1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2  C 3 : {2, 3,5} 3. scan T  C 3 : {2, 3, 5}:2  F 3: {2, 3, 5} TIDItems T1001, 3, 4 T2002, 3, 5 T3001, 2, 3, 5 T4002, 5 Dataset T minsup=50%

 divide-and-conquer strategy  to focus the search on a restricted portion of the initial database and generate as few candidate sequences as possible  FreeSpan, PrefixSpan, WAP-mine, and FS- Miner

 utilize a sort of position induction to prune candidate sequences very early in the mining process and to avoid support counting as much as possible  LAPIN, HVSM, and DISC-all

 searching for patterns in data through  content mining Search engines  structure mining Hyper links (hits / page rank)  usage mining User’s browser data and forms submitted

 One use is for finding user navigational patterns on the World Wide Web by extracting knowledge from web logs

 An example of applying sequential pattern mining  S = {a, b, c, d, e, f}  [P1, ] [P2, ] [P3, ] [P4, ]  Frequent pattern of abac

 combines traditional mining methods and information visualization techniques user is directly involved  VDMS - simplicity, reliability, reusability, availability, and security

  a4-J8g a4-J8g  wQCFFfw wQCFFfw