Download presentation
Presentation is loading. Please wait.
Published byOphelia Booth Modified over 8 years ago
1
Knowledge Discovery for a Focused Domain Scanning of documents and messages of interest to a business and the extraction of relevant facts for knowledge discovery by computer. Driven by a Brief, a description of the domain of interest. Based on linguistic and statistical analysis of text. Supported by Lexica and Knowledge Bases. Knowledge discovery by inference and visualization. Academic project partners: –TAI Research Centre, Helsinki University, Department of Computational Linguistics, VTT Information Technology Industrial project partners Linguistic Partners: Conexor Oy and Lingsoft Oy Contact: Matti Keijola, +358 9 451 2163, matti.keijola@hut.fi
2
The BRIEFS Refinement Process Text Morpho/Syntax Expressions Semantics Elements Event scenarios Element relations Knowledge base Inference of trends and consequences Interactive visualization
3
Brief/ Context Linguistic/ Statistical Analysis Terms/ Relations Ontology Creation Ontology (augmented with linguistic/ statistical knowledge) Ontologies’ Repository Domain Documents Linguistic/ Statistical Analysis Relevance Estimation Relevant Documents Information Extraction Inference and Knowledge Discovery Browsing and Visualization Ontological Information Ontological Information Ontological Information Knowledge Warehouse Domain Information BRIEFS Architecture
4
Example : BRIEFS WAP INDUSTRY RELATIONS Company Technology Product/ Service Person Stock Exchange Location Symbol Is_in Member_of Symbol_of Marketed_by, owned_by, manufactured-by... Uses... Owned_by…. Name category, function, features Name Employed_by... title, function Name Name, geography Name Is_used_in Deals with...
5
Extraction Results Database DOCUMENTS TEMPLATE TR TE
6
BRIEFS Frequency Chart
7
BRIEFS Relation List
8
BRIEFS: Extracted Deal Events
9
Clustering by Extracted Deal Events
10
Some Annotation Issues Identifying names of concepts is important for IE. –But what is a name? “The 7110 phone...” “The Nokia 7110 mobile phone...” “Nokia’s 7110 phone unit” “The phone…” “It …” Harmonisation is important for computer-based KA Coreference and pronominial anaphora resolution Some concepts are not addressed directly by name
11
Coreference and Anaphora Resolution
12
Some Potential Uses of a BRIEFS-like System Follow-up of presence of (e.g. companies, persons, products, technologies, …) in news reports and discovery of trends thereof Follow-up of deals companies make, discovery of evolving networks of deals Follow-up of events in in industry, discovery of trends and traits Follow-up of development of new technologies Follow-up of changes in business practices Investor/advisor review Other domains: e.g. maintenance reports In summary: extraction of specific data from written documents and messages, harmonizing the data, accumulating the data into knowledge warehouses and making of inferences based on the accumulated data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.