WIKT 2007Košice, november Tvorba sémantických metadát Michal Laclavík Ústav Informatiky SAV
WIKT 2007Košice, november Semantické metadáta Ontológia –Model –Inštancie = semantické metadáta Protege Automatické formuláre –NAZOU Wrapovanie Databáz –RDB2Onto, D2R MAP, D2R, R2O Anotácia, značkovanie dokumentov
WIKT 2007Košice, november Information Extraction MUC Conferencies –Named Entity recognition (NE) Finds and classifies names, places, etc. –Coreference resolution (CO) Identifies identity relations between entities. –Template Element construction (TE) Adds descriptive information to NE results (using CO). –Template Relation construction (TR) Finds relations between TE entities. –Scenario Template production (ST) Fits TE and TR results into specified event scenarios. Gate –Information Extraction platform
WIKT 2007Košice, november Goal Identification of instances from the ontology –search Automatic ontology population –creation
WIKT 2007Košice, november Search Disambiguity – viac zmyselnosť Aliases – Miery podobnosti (IR, NLP, IE …) –Kosinusova miera –Levenstainove operacie –...
WIKT 2007Košice, november Create Patterns for creating individuals –Structure, regex, IE techniques Relevance –If individual should be really created Same problems as in Search as well
WIKT 2007Košice, november Information Retrieval – Evaluation Precession Recall F-measures
WIKT 2007Košice, november Manual Annotation & Browsing
WIKT 2007Košice, november Wrappers Similar to IE Pattern is structure of document Not tied with KB Good results in combination with other techniques –Location: San Francisco, New York –Job Type: Permanent, Contract –Job Type: Full-time
WIKT 2007Košice, november C-PANKOW POS tagging –QTag Google API for relevance
WIKT 2007Košice, november KIM Separation –KB –Doc –Annotation NE recognition –GATE Lucene
WIKT 2007Košice, november SemTag Only distributed annotation 264 million web pages 434 million annotations TAP Knowledge base Ambiguity resolution –Cosine measure Standford
WIKT 2007Košice, november Ontea Pattern based annotation –Regex Podobné metódy –C-PANKOW, SemTag Iné jazyky ako angličtina –Slovenčina Rýchlejšie a presnejšie ako C-PANKOW Umožňuje aj tvorbu inštancií, SemTag nie Architektúra je tvorená tak aby sa dali pripojiť iné Pattern anotačné riešenia –Wraper, IE,... NAZOU, , Poľana => +
WIKT 2007Košice, november Evaluation
WIKT 2007Košice, november Evaluation
WIKT 2007Košice, november Evaluation
WIKT 2007Košice, november Conclusion Good area for future research Problem of meta data need to be solved, including –Protocols –Meta data repositories –Upper ontologies –Meta data creation algorithms (annotation algorithms)