Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Generating Semantic Annotations for Frequent Patterns with Context Analysis Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai University.

Similar presentations


Presentation on theme: "1 Generating Semantic Annotations for Frequent Patterns with Context Analysis Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai University."— Presentation transcript:

1 1 Generating Semantic Annotations for Frequent Patterns with Context Analysis Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai University of Illinois at Urbana-Champaign November 3, 2015

2 2 Frequent Patterns AB AB ABEABF C CD CDE EF EF DECE D AE BEBF AF Frequent Pattern Mining ( [Agrawal & Srikant 94] and many others) ABE CDE ABF CDEF ABEF …… Database Itemsets: diapermilk ; camerafilm ; … Sequential Patterns:... Mining Closed Frequent Graph Patterns … … Mining Graph and Structured Patterns in... Subgraph Patterns: …

3 3 Frequent Patterns AB AB ABEABF C CD CDE EF EF DECE D AE BEBF AF Toward Understanding the Patterns -- Find Canonical Patterns ABE CDE ABF CDEF ABEF …… Database CDEF 1.0 0.90.8 ( Yan et al ‘05) ( Xin et al ‘05)

4 4 Do they all make sense? What do they mean? How are they useful? diaperbeer female sterile (2) tekele Our goal: Annotate patterns with semantic information morphological info. and simple statistics Semantic Information Not all frequent patterns are useful, only those with meanings… Toward Understanding the Patterns -- How to Interpret Patterns?

5 5 Challenges How can we represent the semantics of a frequent pattern? (Annotate a pattern with what?) How can we infer pattern semantics? (How to annotate?) How can we do it in a general way? (Do it for all kinds of patterns) Once such annotations are generated, what can we use them for? (Applications)

6 6 Word: “pattern” – from Merriam-Webster A Dictionary Analogy Non-semantic info. Examples of Usage Definitions indicating semantics Synonyms Related Words

7 7 What about a “Pattern Dictionary”? -- Semantic Pattern Annotation (SPA) PatternWord: function; pronunciation; date; etc.Non-Semantic: A form or model proposed for … Definitions: a dressmaker’s pattern Examples: design, device, Synonyms motif, motive… a pattern of dissent original, constellation …Related words: “latent semantic analysis”Pattern:sequential; close; sup = 0.1%Non-Semantic: “indexing”, “semantic”, “S. Dumais”, Context Indicators (CI): “singular value decomposition”, … index by latent semantic analysis Representative Transactions: probablist latent semantic analysis “latent semantic indexing”, Semantically similar Patterns (SSP): “LSA”, “PLSA”

8 8 How Can We Generate Such an Entry? ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, E, F, EF … Trans.ABE; ABEF SSPsCD; … Database Semantic Annotations P 2 : CD P3:P3: P 1 : AB Pn:Pn: … Frequent Patterns … PatternCD …… ? How to infer the semantics of a frequent pattern?

9 9 Continue the Analogy… You’ll know the meaning of a pattern by its context “You shall know a word by the company it keeps.” - Firth 1957 Data … association … pattern … MINE … algorithm … mountain … Africa … diamond … MINE … weight … {C,D}: { … Printer, Film, Camera, Lens, … } {A,B}: { … Baby, Milk, Diaper, Toy, Soymilk… } Pattern Context

10 10 Our Approach: Model the Context ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, E, F, EF Trans.ABE; ABEF SSPsCD; … P 2 : CD P 1 : AB Pn:Pn: … DatabaseFrequent Patterns Semantic Annotations … PatternCD …… Context Units Context Units = Objects co-occurring with p

11 11 Semantic Analysis with Context Models Task1: Model the context of a frequent pattern Based on the Context Model… Task2: Extract strongest context indicators Task3: Extract representative transactions Task4: Extract semantically similar patterns

12 12 Task1: Context Modeling - A Vector Space Model ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, E, F, EF Trans.ABE; ABEF SSPsCD; … P 2 : CD P 1 : AB Pn:Pn: … Database Frequent Patterns Semantic Annotations … PatternCD …… Context Units Context Unit Weight: Context Similarity: Co-occurrence Mutual Information …… Cosine Similarity Pearson Coefficient ……

13 13 Context Unit Selection diapermilkbabywearlotion cameramemory stickprinter t1t1 t2t2 Valid Context Units: In general, Context Units are frequent patterns Single items diapermilkprinter,,…, t1t1 t2t2 transactions milklotion itemsets camera

14 14 Context Unit Selection: Redundancy Removal Problem: too many valid context units, most are redundant –{ Diaper, milk, babywear }: “diaper”, “diaper, milk”, “milk, babywear”, “milk, lotion”, … Solution: –use close patterns –micro-clustering: (hierarchical, one-pass) Jaccard Distance (γ: threshold to stop clustering):

15 15 Task2: Extract Context Indicators ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, EF, ABE.. Trans.ABE; ABEF SSPsCD; … P 2 : CD P 1 : AB Pn:Pn: … Database Frequent Patterns Semantic Annotations … PatternCD …… Context Units Context Unit Weighting AB 3.0 EF 2.0 ABE 1.0 …

16 16 Task3: Extract Representative Transactions ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, E, F, EF Trans.ABEF; ABE SSPsCD; … P 1 : AB DatabaseFrequent Patterns Semantic Annotations … PatternCD …… Context Units 3.0, 0, …,2.0, …, 1.0 1.0, 0, …,1.0, …, 1.0 T1:T1: Semantic Similarity T 5 0.8 T 1 0.6 T 3 0.6 … T5:T5:

17 17 Task4: Extract Semantically Similar Patterns ABE CDE ABF CDEF ABEF PatternAB NonSup = 60% CIAB, E, F, EF Trans.ABEF; ABE SSPsCD; … P 1 : AB DatabaseFrequent Patterns Semantic Annotations … PatternCD …… Context Units 3.0, 0, …,2.0, …, 1.0 0, 3.0, …,2.0, …, 0.5 Semantic Similarity CD 0.7 BF 0.5 EF 0.3 … AB: P k : EF P 2 : CD

18 18 Experiments Three different real world applications –Annotating DBLP title/authors Patterns –Motif/Gene-Ontology (GO) matching –Gene Synonyms extraction Study the effectiveness of the proposed SPA methods Explore applications of SPA to different real world tasks

19 19 Annotating DBLP Co-authorship and Title Pattern Substructure Similarity Search in Graph Databases X.Yan, P. Yu, J. Han …… …… Database: TitleAuthors Frequent Patterns P 1 : { x_yan, j_han } Frequent Itemset P 2 : “substructure search” Frequent Sequential Pattern Pattern{ x_yan, j_han} NonSup = … CI{p_yu}, graph pattern, … Trans.gSpan: graph-base…… SSPs{ j_wang }, {j_han, p_yu}, … Semantic Annotations Context Units

20 20 DBLP Results: Frequent Itemset Context Indicator (CI) graph; {philip_yu}; mine close; graph pattern; index approach; sequential pattern; … Representative Transactions (Trans) > gSpan: graph-base substructure pattern mining; > mining close relational graph connect constraint; … Semantically Similar Patterns (SSP) {jiawei_han, philip_yu}; {jian_pei, jiawei_han}; {jiong_yang, philip_yu, wei_wang}; … Pattern= {xifeng_yan, jiawei_han} Annotations:

21 21 DBLP Results: Freq. Seq. Pattern Context Indicator (CI) {w_bruce_croft}; web information; full text; {monika_rauch_hezinger}; {james_p_callan}; … Representative Transactions (Trans) > web information retrieval > language model information retrieval Semantically Similar Patterns (SSP) information use; web information; probabilistic information; information filter; text information; … Pattern= “Information … retrieval” Annotations:

22 22 Motif-GO Matching GO term 1 GO term 2 GO term 3 GO term 4 GO term 5 Sequence 1 Sequence 2 Sequence 3 motif1motif2 motif3 motif4motif5 motif2 ? Motif: a subsequence pattern in the sequences Gene Ontology (GO) terms: annotating the functionality of sequence, motifs

23 23 Motif-GO Matching (Cont.) GOTerm1; GOTerm2; GOTerm3 GOTerm3 …… Database: GO termsProtein Sequence Frequent Patterns P 2 : GOTerm2 Single Item Pattern PatternMotif1 Non CIGOTerm1, GOTerm3, … Trans. SSPsGOTerm1, GOTerm2, … Semantic Annotations Context Units P 1 : Motif1 Sequential Pattern Motif 1 Motif-GO matching Motif1 GOTerm1 GOTerm2

24 24 Motif/GO Matching: Evaluation Gold standard generated by human experts Measure: Mean reciprocal rank (MRR) –Reflects ranking accuracy (the higher the better) –1/Rank (0.5 means the correct answer is ranked as the 2 nd ) Results: Mutual InformationCo-occurrence Random Selection0.0023 Context Indicators0.58770.6064 SSPs0.40170.4681 Weights for Context Units: Ranking Strategy

25 25 Gene Synonym Extraction Gene Synonyms: –A Sequential Pattern in the textual database –Matching gene synonyms: a challenging and important new problem in mining biology data –Analogy: thesaurus or synonyms in dictionary Gene_idGene Synonyms FBgn0001000female sterile 2 tekele ; fs 2 sz 10 ; tek; fs 2 tek; tekele; …

26 26 Gene Synonym Extraction (Cont.) … D. melanogaster gene Female sterile (2) Tekele … … Female sterile (2) Tekele, abbreviated as Fs(2)Tek … … Database: Biomedical Sentences Frequent Patterns P 1 : female sterile (2) tekele Sequential Pattern Patternfemale sterile (2) tekele Non CI Trans. SSPs Fs(2)Tek, female sterile, fs 2 sz 10, … Semantic Annotations Context Units Matched Synonyms female sterile (2) tekele Fs(2)Tek fs 2 sz 10 female sterile … P 2 : Fs(2)Tek Sequential Pattern Context Units: context units can be single words or sequential patterns

27 27 Gene Synonym Extraction: Results Effective! MRR > 0.5 frequent pattern >> single words Micro-clustering is useful Running time: hierarchical Running time: one-pass MRR: hierarchical MRR: one-pass

28 28 Conclusions A novel problem: semantical pattern annotation A structured annotation for frequent patterns A general method based on context modeling A general post-processing procedure of frequent pattern mining on any types of pattern Applicable to and effective for quite different tasks Future work: –Tune for specific tasks –Better context unit weights, redundancy removal, etc

29 29 Thanks and Questions


Download ppt "1 Generating Semantic Annotations for Frequent Patterns with Context Analysis Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai University."

Similar presentations


Ads by Google