Ontea: Pattern based Annotation Platform Michal Laclavík.

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
All Rights Reserved, Copyright © FUJITSU LABORATORIES LTD An approach to KNOW-WHO using RDF Nobuyuki Igata, Hiroshi Tsuda, Isamu Watanabe and Kunio.
Extracting Semantic Relationships Between Wikipedia Articles Lowell Shayn Hawthorne Suzette Stoutenburg Supervisor: Jugal Kalita University of Colorado.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Wrap up  Matching  Geometry  Semantics  Multiscale modelling / incremental update / generalization  Geometric algorithms  Web Services.
OntoBlog: Linking Ontology and Blogs Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of Informatics, Japan 2 Asian.
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Breakout Session 5 Languages (operators and rules) for specifying constraints, mappings, and policies governing financial instruments.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
MUSCLE WP9 E-Team Integration of structural and semantic models for multimedia metadata management Aims: (Semi-)automatic MM metadata specification process.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
11 October Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
RDB2Onto: Approach for creating semantic metadata from relational database data Martin Šeleng, Michal Laclavík, Zoltán Balogh, Ladislav Hluchý Institute.
Survey of Semantic Annotation Platforms
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng ( research, information extraction, information retrieval, contextual.
Information processing Michal Laclavík, Ladislav Hluchý ( research, information extraction, information retrieval, contextual recommendation)
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Institute of Informatics, Slovak Academy of Sciences Michal Laclavík Ladislav Hluchý.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
Semantic Technologies & GATE NSWI Jan Dědek.
Session 4e, 24 October 2007 eChallenges e-2007 Copyright 2007 Institute of Informatics, SAS Network Enterprise Interoperability and Collaboration using.
Food and Agriculture Organization of the UN Library and Documentation Systems Division July 2005 Ontologies creation, extraction and maintenance 6 th AOS.
ISIM’06, Přerov ; Corporate Memory Corporate Memory: A framework for supporting tools for acquisition, organization and maintenance of information.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
This Briefing is: UNCLASSIFIED Aha! Analytics 2278 Baldwin Drive Phone: (937) , FAX: (866) A Recurring Knowledge Transfer Problem, Linked.
Web Information Systems Modeling Luxembourg, June VisAVis: An Approach to an Intermediate Layer between Ontologies and Relational Database Contents.
Session 10a, 21st October 2005 eChallenges e-2005 Copyright 2005 K-Wf Grid, Institute of Informatics SAS Experience Management based on Text Notes (EMBET)
Lightweight Semantic Approach for Enterprise Search and Interoperability Michal Laclavík, Štefan Dlugolinský, Martin Šeleng, Marek Ciglan, Martin Tomašek,
MICROSOFT SEMANTIC ENGINE Unified Search, Discovery and Insight.
Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
11 November Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
1 Ontolog OOR-BioPortal Comparative Analysis Todd Schneider 15 October 2009.
GODO: Goal driven orchestration for Semantic Web Services … or how do spells work in the XXI century Juan Miguel Gomez, Mariano Rico, Francisco Garcia.
7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
And the Watson Plugin for the NeOn Toolkit. IST NeOn-project.org The Semantic Web is growing… #SW Pages.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
WIKTBratislava, 28. november Semantic Organization/Enterprise Vision Michal Laclavik, Ladislav Hluchy, Marian Babik, Zoltan Balogh, Ivana Budinska,
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
ArrayExpress Ugis Sarkans EMBL - EBI
Of 24 lecture 11: ontology – mediation, merging & aligning.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
WIKT 2007Košice, november Tvorba sémantických metadát Michal Laclavík Ústav Informatiky SAV.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
6 ~ GIR.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presented by: Hassan Sayyadi
Social Knowledge Mining
Defining Data-intensive computing
System Model Acquisition from Requirements Text
Presentation transcript:

Ontea: Pattern based Annotation Platform Michal Laclavík

Onteahttp://ontea.sourceforge.net2 Ontea Method Motivation –To create semantic meta data from texts or documents Approach –Even unstructured text contains patterns –Patterns can be used to extract various objects from text –Results are: key - value pairs –Such pairs can be transformed to ontology individuals Class – individual Individual – property

Onteahttp://ontea.sourceforge.net3 Result Examples Text –Bratislava is the capital of Slovakia. Slovakia is in Europe. Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location Ontea discovers key – value pair: – Location – Europe By transformation to ontology knowledge base - it finds Europe as continent using inference (sub-class of Location) –Continent – Europe More Examples are in the table: #Text Key – valuePatterns – regular expressions 1Apple, Inc.Company: AppleCompany: ([A-Za-z0-9]+)[, ]+(Inc|Ltd) 2Mountain View, CA 94043Settlement: Mountain ViewSettlement: ([A-Z][a-z]+[ ]*[A-Za-z]*)[ ]+[A-Z]{2}[ ]*[0-9]{5} 4Mr. Michal LaclavikPerson: Michal LaclavikPerson: (Mr.|Mrs.|Dr.) ([A-Z][a-z]+ [A-Z][a-z]+)

Onteahttp://ontea.sourceforge.net4 Features Identification of concept instances from the ontology Automatic population of ontologies with instances Identifying relevance, when creating instances using information retrieval techniques Large scale semantic annotation of documents or texts using Google’s MapReduce architecture.

Onteahttp://ontea.sourceforge.net5 Advantages Simple, customizable method Not tied to document structure Architecture build on detection of key-value pairs and its various transformation. For example: –Text: “Slovensko je v Európe“=> –Extraction: Location – Európe => –Transformation, Lemmatization: Location – Európa => –Transformation, Ontology: Continent – Europe Scalable method. Ported to Grid and Hadoop. Applicable on texts in any language Success rate 60%-90% depending on used patterns, transformers and application

Onteahttp://ontea.sourceforge.net6 Integration with other tools Ontea DocConverter Nalit Morphonary Lucene URL Plain Text Language Identification Pattern Matching Transformation: Lemmatization Transformation: Relevance Identification Ontology Repository Transformation: Individual Search and Creation

Future research & development