Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Associations in Collections of Text 99419-511 김유환.

Similar presentations


Presentation on theme: "Finding Associations in Collections of Text 99419-511 김유환."— Presentation transcript:

1 Finding Associations in Collections of Text 99419-511 김유환

2 Introduction The need to develop tools to help users access and understand large quantities of multimodal information Nontrivial extraction of implicit, previously unknown, and potentially useful information from data KDT(Knowledge discovery from Text)

3 The FACT System Architecture Three sources of information –Knowledge Sources Background Knowledge unary and binary predicates over the keyword labeling the documents 유의어 사전 –GUI –Text Collections Must either already be labeled with a set of keywords Or must be fed through a text categorization system that augments documents with such keywords

4

5 Associations FACT focuses on the task of finding association in collections of text. r={t 1,…,t n } : Collection of documents R={I 1,…,I m } : Set of Keywords t(A) = 1 : A is one of the keywords labeling t (X) : The set of all documents t i that are labeled (at least) with all the keywords in X. X is called a  -covering if |(X)|>=  W=>B : association over over r –all documents that are labeled with the keywords in W, at lest a proportion r of them are also labeled with keywords in B

6 The Query Language Association-discovery query –What type of keywords are desired in the left-hand and right-hand side of any found associations –Any found association to satisfy unary predicates binary predicates : define relationships between keywords –Constraints on the size of the various components of the association –BNF grammar

7 The Query Language (2) Find : (5/0.5) c1:country, c2:country=>t:topic Where : c1  G7, c2  {Arab League}, t  ExportCommodities(c1) –at least half of the time, whenever a G7 country and an Arab League country label a document, the document is labeled by some topic that is not an export commodity of the G7 country, and this occurs at least 5 times in the collection

8 Query Execution 사전 지식 –  -cover 인 집합의 부분집합은 모두  -cover 이다. The set of candidate  -covers is built incrementally, starting from singleton  -covers and adding elements to a set so long as the set stays a  -cover Finding associations in the presence of constraints

9 Presentation of Associations Provide a browsing tool that helps the user easily focus on the subset of results that are potentially relevant

10 Applying FACT to Newswire Data Reuters data Background Knowledge : CIA World FactBook Run a series of queries using FACT and compared the CPU time and the number of associations found for each query 결과 –the specification of background-knowledge constraints actually provides information that is exploited by our discovery algorithm, speeding up the association-discovery process

11

12 Final Remarks Better than Database Query Presents the user with an easy-to-use graphical interface in which discovery tasks can be specified


Download ppt "Finding Associations in Collections of Text 99419-511 김유환."

Similar presentations


Ads by Google