Emerging Trend Detection Shenzhi Li. Introduction What is an Emerging Trend? –An Emerging Trend is a topic area for which one can trace the growth of.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Chapter 5: Introduction to Information Retrieval
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Clustering Categorical Data The Case of Quran Verses
Improved TF-IDF Ranker
Creating a Similarity Graph from WordNet
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Data Mining Association Analysis: Basic Concepts and Algorithms
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
Discrete Structures Chapter 5 Relations and Functions Nurul Amelina Nasharuddin Multimedia Department.
Data Mining Association Analysis: Basic Concepts and Algorithms
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Chapter 5: Information Retrieval and Web Search
Function: Definition A function is a correspondence from a first set, called the domain, to a second set, called the range, such that each element in the.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
Study Skills Study Skills Active Learner vs Passive Learner.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
An Effective Fuzzy Clustering Algorithm for Web Document Classification: A Case Study in Cultural Content Mining Nils Murrugarra.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Basic Concepts of Discrete Probability (Theory of Sets: Continuation) 1.
Querying Structured Text in an XML Database By Xuemei Luo.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Chapter 6: Information Retrieval and Web Search
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Algorithmic Detection of Semantic Similarity WWW 2005.
Relations and their Properties
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 CS 430: Information Discovery Lecture 5 Ranking.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Relations and Functions ORDERED PAIRS AND CARTESIAN PRODUCT An ordered pair consists of two elements, say a and b, in which one of them, say a is designated.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Efficient Phrase-Based Document Similarity for Clustering IEEE Transactions On Knowledge And Data Engineering, Vol. 20, No. 9, Page(s): ,2008.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Julián ALARTE DAVID INSA JOSEP SILVA
SAMT 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
ece 627 intelligent web: ontology and beyond
Introduction to Information Retrieval
Association Analysis: Basic Concepts
Presentation transcript:

Emerging Trend Detection Shenzhi Li

Introduction What is an Emerging Trend? –An Emerging Trend is a topic area for which one can trace the growth of interest and utility over time. Example: “XML”, a technology that emerged in the mid 1990’s. Goals –Teach the students how to do literature search –offer students ways to go beyond the knowledge presented in the course work, by exploring current research trends.

semi-automatic methodology Step 1 –decide a domain area to search in Step 2 –Find the famous journals, conference or workshops among this domain area –Read papers to find several candidate emerging trends Step 3 –Find more evidence for the candidate trends Step 4 –Verify the candidate trends using INSPEC database

semi-automatic methodology While (there are more links from the search engine retrieved pages the desired number of candidate trends has not been found) { # Define searchTerm = "Candidate Emerging Trend" "constraining term“ # Define m = frequency of "searchTerm" in page # Define n = sum of all the frequencies of helper terms in page # Define L2 = an empty list used to store the candidate emerging trends Click on link = 1 // first link of interest in the search results if( "Year of page" in range of [current year-4, current year]) { if( m >= 2 n >= 2 ){ Accept the page; Add "searchTerm" to L2 if it is a candidate emerging trend; Look for the phrases with the highest frequency of occurrence. Add them to L2 if they qualify as candidate emerging trends (use domain knowledge); Give special attention to the line (or paragraph) containing the pattern “constraining term helper term”. Add phrases appearing in that paragraph (or sentence) that are judged to be candidate emerging trends to L2 (use domain knowledge) }else{ Reject the page; } } else{ reject the page; } Click on link++ or exit // click on next link of interest or exit }

Improvements Literature search –how to form queries –how to judge the quality of sources –how to tell the authority of the documents –Introduce several search engines

Improvements - tools

Noun Phrases Extraction Fuzzy Phrase Matching –can search using a combination of part of speech tags and exact words to find candidate trends appear in certain patterns –Example: “JJ+programming” will return phrases like “object-oriented programming”

Experiments Methodology –randomly split the students into two groups of roughly equal numbers –Students from both groups A and B were expected to have attended the lectures of the class. They were also expected to have introductory knowledge in the main topic area before participating in the experiment. – All the students had access to their textbooks, reference books and handouts given in the class. –Only Group B has the access to the multimedia tutorial which introduces the methodology for the algorithmic identification of an emerging trend and the tools designed to help searching.

Experiment Metrics –Precision –Did not use recall, as we do not have the resources to obtain a complete list of emerging trends at the time of this experimental evaluation, nor was it our pedagogical goal to have students retrieve all trends. Results –Conducted three experiments –with a confidence level of 95%, Group B in both classes performs significantly better than Group A.

Experiment Methodology RO1XO2 RO1O2 Results –no difference on the pretest between the two randomly assigned groups –the multimedia group showed a significant improvement in their scores from the pretest to the posttest –the scores on the posttest of multimedia group was higher than the control group –there was a greater increase in learning for the group who used the multimedia tool compared to those who only attended the lecture.

Lattice Definition –Let P be a set. An Order (or partial order) on P is a binary relation  on P such that, for all x,y,z  P, x  x x  y and y  x imply x=y x  y and y  z imply x  z –Let P be a non-empty ordered set. If x  y and x  y exist for all x,y  P, then P is called a lattice. If  S and  S exist for all S  P, then P is called a complete lattice.

Fuzzy Lattice Definition: A fuzzy lattice is a pair (L,  (x,y)), where L is a conventional lattice and  : S  [0,1] is a fuzzy membership function on the universe of discourse S =  (x,y): x,y  L . It is  (x,y) = 1 if and only if x  y in L.  is called inclusion measure. Definition: A function h: L  R on a complete lattice L, satisfies the following three properties: –h(O)=0, where O is the least element in L –u  w  h(u)  h(w), u,w  L –u  w  h(x  w)-h(x  u)  h(w)-h(u) x,u,w  L The inclusion measure of a lattice is defined as k(x,y) = h(y)/h(x  y) V. Petridis and v.G. Kaburlasos. "Clustering and Classification in Structured Data Domains Using Fuzzy Lattice Neurocomputing". IEEE Trans on Knowledge and Data Engineering, Vol 12, No 2, March, 2001.

Formal Concept Analysis A context is a triple (G,M,I) where G and M are sets and I  G  M. The elements of G and M are called objects and attributes respectively. For A  G and B  M, define –A’ =  m  M | (  g  A) gIm , –B’ =  g  G | (  m  B) gIm  ; A concept of the context (G,M,I) is defined to be a pair (A,B) where A  G, B  M, A’=B and B’=A. The set of all concepts of the context (G,M,I) is denoted by B(G,M,I). For concepts (A1,B1) and (A2,B2) in B(G,M,I) we write (A1,B1)  (A2,B2), and say that (A1,B1) is a subconcept of (A2,B2), or that (A2,B2) is a superconcept of (A1,B1), if A1  A2 (which is equivalent to B2  B1). (B(G,M,I);  ) is a complete lattice. it is known as concept lattice of the context (G,M,I).

Lattice vs. Association Rules Transact ion ABCDEF 1*** 2**** 3** 4*** 5*** C, (1235) B, (124)D, (245) ABC, (12) CE,(35) BDF,(4) BD, (24) CD, (25) ABCD,(2) CDE, (5) , (12345) ABCDEF, 

Lattice vs. Graph The power set of graph G is lattice- ordered, and corresponding lattice- ordering, lattice-meet, and lattice-joint are conventional set-inclusion, set-intersection and set-union. So G is a complete lattice. The least element and the greatest element are empty set and the master graph.

Conclusions Apply fuzzy concept lattice to association rules building Embed time element and latent semantic into concept lattice to model term-doc matrix Develop algorithms to mine fuzzy concept lattice to detect emerging trends