Finding Associations in Collections of Text 99419-511 김유환.

Slides:



Advertisements
Similar presentations
ELibrary Topic Search Basics eLibrary topic search allows users to locate articles and multimedia resources –Relevant to K-12 curricula and user.
Advertisements

DATA PROCESSING SYSTEMS
XML DOCUMENTS AND DATABASES
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Support.ebsco.com Canadian Points of View Reference Centre Tutorial.
1 Microsoft Access 2002 Tutorial 9 – Automating Tasks With Macros.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 7.1.
Automating Tasks With Macros
Requirements Specification
Automating Tasks With Macros. 2 Design a switchboard and dialog box for a graphical user interface Database developers interact directly with Access.
Core Text Mining Operations 2007 년 02 월 06 일 부산대학교 인공지능연구실 한기덕 Text : The Text Mining Handbook pp.19~41.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
CH 11 Multimedia IR: Models and Languages
Chapter 8 Management Support and Coordination Systems.
Computer Science & Engineering 2111 CSE 2111 Lecture Querying a Database 1CSE 2111 Lecture- Querying a Database.
Course: Introduction to Computers
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Marko Grobelnik Jasna Škrbec Jozef Stefan Institute Social Context as a part of News-Archive-Explorer Web application for exploratory browsing of news.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
October 30, 2008 Extensible Workflow Management for Simmod ESUG32, Frankfurt, Oct 30, 2008 Alexander Scharnweber (DLR) October 30, 2008 Slide 1 > Extensible.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
Chapter 1 Introduction to Data Mining
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
CHAPTER EIGHT Accessing Data Processing Databases.
CHAPTER EIGHT Accessing Data Processing Databases.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Data Mining By Dave Maung.
Presenter: Shanshan Lu 03/04/2010
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engine Architecture
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Article by Dunja Mladenic, Marko Grobelnik, Blaz Fortuna, and Miha Grcar, Chapter 3 in Semantic Knowledge Management: Integrating Ontology Management,
Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL.
Introduction to ArcGIS for Environmental Scientists Module 3 – GIS Analysis Model Builder.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Computer Systems & Architecture Lesson 4 8. Reconstructing Software Architectures.
GEMET GEneral Multilingual Environmental Thesaurus leading the way to federated terminologies Stefan Jensen, Head of information services group with input.
XP New Perspectives on Microsoft Office Access 2003 Tutorial 10 1 Microsoft Office Access 2003 Tutorial 10 – Automating Tasks With Macros.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Final Year Project 1 (FYP 1) CHAPTER 1 : INTRODUCTION
THE FACTBOOK POJECT (SOFTLAB) The Factbook project.
Support.ebsco.com Points of View Reference Center Tutorial.
Evaluation of Information Retrieval Systems Xiangming Mu.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
OpenAccess Gear David Papa 1 Zhong Xiu 2, Christoph Albrecht, Philip Chong, Andreas Kuehlmann 3 Cadence Berkeley Labs 1 University of Michigan, 2 Carnegie.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
An Introduction to Visual Basic .NET and Program Design
Data Warehousing and Data Mining
Metadata Framework as the basis for Metadata-driven Architecture
Web Mining Department of Computer Science and Engg.
Lecture 8 Information Retrieval Introduction
Search Engine Architecture
Presentation transcript:

Finding Associations in Collections of Text 김유환

Introduction The need to develop tools to help users access and understand large quantities of multimodal information Nontrivial extraction of implicit, previously unknown, and potentially useful information from data KDT(Knowledge discovery from Text)

The FACT System Architecture Three sources of information –Knowledge Sources Background Knowledge unary and binary predicates over the keyword labeling the documents 유의어 사전 –GUI –Text Collections Must either already be labeled with a set of keywords Or must be fed through a text categorization system that augments documents with such keywords

Associations FACT focuses on the task of finding association in collections of text. r={t 1,…,t n } : Collection of documents R={I 1,…,I m } : Set of Keywords t(A) = 1 : A is one of the keywords labeling t (X) : The set of all documents t i that are labeled (at least) with all the keywords in X. X is called a  -covering if |(X)|>=  W=>B : association over over r –all documents that are labeled with the keywords in W, at lest a proportion r of them are also labeled with keywords in B

The Query Language Association-discovery query –What type of keywords are desired in the left-hand and right-hand side of any found associations –Any found association to satisfy unary predicates binary predicates : define relationships between keywords –Constraints on the size of the various components of the association –BNF grammar

The Query Language (2) Find : (5/0.5) c1:country, c2:country=>t:topic Where : c1  G7, c2  {Arab League}, t  ExportCommodities(c1) –at least half of the time, whenever a G7 country and an Arab League country label a document, the document is labeled by some topic that is not an export commodity of the G7 country, and this occurs at least 5 times in the collection

Query Execution 사전 지식 –  -cover 인 집합의 부분집합은 모두  -cover 이다. The set of candidate  -covers is built incrementally, starting from singleton  -covers and adding elements to a set so long as the set stays a  -cover Finding associations in the presence of constraints

Presentation of Associations Provide a browsing tool that helps the user easily focus on the subset of results that are potentially relevant

Applying FACT to Newswire Data Reuters data Background Knowledge : CIA World FactBook Run a series of queries using FACT and compared the CPU time and the number of associations found for each query 결과 –the specification of background-knowledge constraints actually provides information that is exploited by our discovery algorithm, speeding up the association-discovery process

Final Remarks Better than Database Query Presents the user with an easy-to-use graphical interface in which discovery tasks can be specified