Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

Slides:



Advertisements
Similar presentations
TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
Advertisements

Integration of association rules into WUM Bastian Germershaus.
Mining Association Rules from Microarray Gene Expression Data.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
The Semantic Web-Week 22 Information Extraction and Integration (continued) Module Website: Practical this week:
LOGO Association Rule Lecturer: Dr. Bo Yuan
1 Web Usage Mining Modelling: frequent-pattern mining I (sequence mining with WUM), classification and clustering) Prof. Dr. Bettina Berendt Humboldt Univ.
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 1 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Web mining, esp. Web usage mining Bettina Berendt Katholieke Universiteit.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.
Chapter 12: Web Usage Mining - An introduction
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
September, 13th gR2002, Vienna PAOLO GIUDICI Faculty of Economics, University of Pavia Research carried out within the laboratory: Statistical.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Information retrieval wed sept data…. -start at 6.45.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Course on Data Mining: Seminar Meetings Page 1/17 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
Data Mining Jim King. What is Data Mining?  A.k.a. knowledge discovery The search for previously unknown relationships in large data setsThe search for.
1 Berendt: Advanced databases, first semester 2011, 1 Advanced databases – Inferring new knowledge.
Developing “Geo” Ontology Layers for Web Query Faculty of Design & Technology Conference David George, Department of Computing.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Web-site Building Methodologies Current Research.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Data Mining Find information from data data ? information.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
1 1 1 Berendt: Advanced databases, first semester 2008, Advanced databases – Semantic Web Mining.
Overview Definition of Apriori Algorithm
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006.
Queensland University of Technology
Data Mining Find information from data data ? information.
Data Mining Jim King.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Byung Joon Park, Sung Hee Kim
Restrict Range of Data Collection for Topic Trend Detection
Action Association Rules Mining
Gyozo Gidofalvi Uppsala Database Laboratory
Association Rule Mining
Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
15-826: Multimedia Databases and Data Mining
Presentation transcript:

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Tying it all together (a start) Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science Last update: 6 December 2007

Berendt: Advanced databases, winter term 2007/08, 2 Goal 1 for today Wrap up yesterday‘s lecture and discussion + prepare you for the next assignment

Berendt: Advanced databases, winter term 2007/08, 3 Goal 2 for today: identify „missing links“ & point to solution approaches (on the board)

Berendt: Advanced databases, winter term 2007/08, 4 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, 5 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, 6 Mining association rules Apriori: (slides from D. Delic) Mining generalized association rules: (Karlsruhe slides)

Berendt: Advanced databases, winter term 2007/08, 7 Main interestingness measures of association rules n Support of a rule A  B = no. of instances with A and B / no. of all instances n Confidence of a rule A  B = no. of instances with A and B / no. of instances with A = support (A & B) / support (A) n Lift of a rule A  B = support (A & B) / [ support (A) * support (B) ] l What does this measure, and in what numerical interval can it be?

Berendt: Advanced databases, winter term 2007/08, 8 Interesting- ness measures

Berendt: Advanced databases, winter term 2007/08, 9 Interestingness as a constraint So we‘re not interested in „show me all patterns“ But „show me all patterns that are interesting = that have properties X“  constraints!

Berendt: Advanced databases, winter term 2007/08, 10 Examples from MINERULE MINE RULE exemple as SELECT DISTINCT 1..n Item as BODY, 1..1 Item as HEAD, SUPPORT, CONFIDENCE WHERE HEAD.Item=« umbrellas » // also other fields, e.g. Date FROM Purchase GROUP BY Tid HAVING COUNT(*)<6 EXTRACTING RULES WITH SUPPORT: 0.06, CONFIDENCE: 0.9 E.g., jacket flight_Dublin  umbrellas (0.08,0.93)

Berendt: Advanced databases, winter term 2007/08, 11 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, 12 The site Business understanding / problem definition: * How do users search in this online catalog? * Which search criteria are popular? * Which are efficient? [Berendt & Spiliopoulou, VLDB Journal 2000]

Berendt: Advanced databases, winter term 2007/08, 13 The concept hierarchies / site ontology (excerpt) SEITE1-...LI (1st page of a list) or SEITEn-...LI (further page) LA („Land“)SA („Schulart“)SU („Suche“)

Berendt: Advanced databases, winter term 2007/08, 14 Sequence mining – one result pattern: successful search for a school in Germany a refinement a repetition a continuation one example pattern select t from node a b, template a * b as t where a.url startswith "SEITE1-" and a.occurrence = 1 and b.url contains "1SCHULE" and b.occurrence = 1 and (b.support / a.support) >= 0.2 (Berendt & Spiliopoulou, VLDB J. 2000) /liste.html?offset=920&ze ilen=20&anzahl=1323&sprac he=de&sw_kategorie=de&ers cheint=&suchfeld=&suchwer t=&staat=de&region=by&sch ultyp=

Berendt: Advanced databases, winter term 2007/08, 15 Sequences

Berendt: Advanced databases, winter term 2007/08, 16 Generalized sequences, navigation patterns, hits in WUM

Berendt: Advanced databases, winter term 2007/08, 17 Aggregated Logs: The basic internal representation in WUM

Berendt: Advanced databases, winter term 2007/08, 18 The confi- dence measure for genera-lized sequences

Berendt: Advanced databases, winter term 2007/08, 19 Templates in the query language MINT, g-sequences, and navigation patterns

Berendt: Advanced databases, winter term 2007/08, 20 Interestingness measures: Support (hits) and confidence

Berendt: Advanced databases, winter term 2007/08, 21 Aggregated Logs, queries, and query results

Berendt: Advanced databases, winter term 2007/08, 22 The basic idea of the WUM algorithm

Berendt: Advanced databases, winter term 2007/08, 23 MINT can express 3 types of constraints (“predicates“)

Berendt: Advanced databases, winter term 2007/08, 24 The WUM gseqm algorithm (B predicates)

Berendt: Advanced databases, winter term 2007/08, 25 Also for higher-order structures (graphs): Ex. MolFea

Berendt: Advanced databases, winter term 2007/08, 26 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, 27 The basic idea (on the board)

Berendt: Advanced databases, winter term 2007/08, 28 Agenda Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, 29 (One) basic idea (on the board)

Berendt: Advanced databases, winter term 2007/08, 30 Next lecture Naïve Bayes [remaining from yesterday] Changing representation: LSI [rem. from yesterday] Ont.+KDD: Apriori and taxonomies KDD+DB: Constrained pattern mining – ex. WUM KDD+DB: Inductive databases (very brief) KDD+Ont.: Induction and Semantic Web (very brief) Applications

Berendt: Advanced databases, winter term 2007/08, 31 References and background reading; acknowledgements n Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages , Washington, D.C., May l (presentation from Delic, D. (2002). Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study.) n Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September l (presentation from kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf) kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf n P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, July n MINERULE: R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, Vol. 2 (2), pp , n WUM and the Schulweb study: Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, n MolFea (esp. The example): S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, n De Raedt, L. (2002) A perspective on inductive databases. SIGKDD Explorations. Volume 4, Issue 2,