Morphoogle - A Multilingual Interface to a Web Search Engine

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

WP 10 Multilingual Access Philipp Daumke, Stefan Schulz.
Chapter 5: Introduction to Information Retrieval
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
Information Retrieval in Practice
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Multilingual Access to Biomedical Documents Stefan Schulz, Philipp Daumke Institute of Medical Biometry and Medical Informatics University Medical Center.
Methodology Conceptual Database Design
Overview of Search Engines
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
1. 2 Content WSK Online is a new online database of specialized dictionaries covering all the major areas of linguistics and communication science: Biannual.
Databases & Data Warehouses Chapter 3 Database Processing.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
1. 2 Content The Romanische Bibliographie Online is the only comprehensive specialist bibliography for Romance language and literature studies –available.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
A Language Independent Method for Question Classification COLING 2004.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Stefan Schulz, Kornél Markó, Philipp Daumke, Udo Hahn, Susanne Hanser, Percy Nohama, Roosewelt Leite de Andrade, Edson Pacheco, Martin Romacker Semantic.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
Search Engine Architecture
EPI 218 Queries and On-Screen Forms Michael A. Kohn, MD, MPP 9 August 2012.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
How Google and Microsoft taught search to “understand” the Web Austin Granger Chris Hesemann.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
A project aimed at developing biomedical student internships in the Israeli industry Biomedical Engineering Internship Network.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Shuang Wu REU-DIMACS, 2010 Mentor: James Abello. Project description Our research project Input: time data recorded from the ‘Name That Cluster’ web page.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2,
Information Retrieval in Practice
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
A Shopping Agent for the WWW
Martin Rajman, Martin Vesely
Multilingual Biomedical Dictionary
Prepared by Rao Umar Anwar For Detail information Visit my blog:
What are the Database Applications
A research literature search engine with abbreviation recognition
European Network of e-Lexicography
Lecture 12: Data Wrangling
Introduction to Search Engines
What is a Search Engine EIT, Author Gay Robertson, 2017.
IL Step 3: Using Bibliographic Databases
MATERIAL Resources for Cross-Lingual Information Retrieval
International Marketing and Output Database Conference 2005
Introduction to Information Retrieval
Haystack: an Adaptive Personalized Information Retrieval System
Lecture 4: HTML/CSS Lab Wednesday February 1, /12/2019
Databases and Information Management
WJEC GCSE Computer Science
Introduction to Search Engines
Presentation transcript:

Morphoogle - A Multilingual Interface to a Web Search Engine Freiburg University Hospital 1 Jena University 2 Pontificial University of Paraná 3 3 Philipp Daumke1, Stefan Schulz1,3, Kornél Markó1,2 Morphoogle is a query translation and expansion tool which allows users to search for biomedical content in different languages in the web. This approach is based on Morphosaurus, a system which transforms medical text into a language independent interlingua. This underlying procedure is named Morpho-Semantic Indexing (MSI), a term normalization methodology developed by the authors, which deals with various morphological processes in different languages. MSI uses a special type of dictionary, whose entries consist of subwords, i.e. semantically minimal units. Subwords are grouped into language independent equivalence classes, represented by Morpheme identifiers (MIDs). A morphosyntactic parser extracts subwords from texts and assigns MIDs in a three step procedure (cf. Figure 1). Morphoogle We acquired domain and language specific corpora from various medical sources in the WWW. Using a tokenizer we then created large lists of surface words, bigrams and trigrams of adjacent words containing their frequencies within these corpora (target words). All target words are translated to a set of MIDs and stored in language specific databases. These databases consist of about 3 M entries each (cf. Figure 2). A user can send a query via a web interface. Again, this query is firstly altered to a set of corresponding MIDs. This MID set is used to create a list of possible reading variants (partitions) (cf. Figure 3). Each partition consists of one or more subwords which are now compared to the relevant databases. All matching records are finally sorted using several heuristics and sent to a search engine (e.g. Google) (cf. Figure 4). Figure 2: Generation of target word databases Figure 3: Output of the dictionary (here in Portuguese and English) Figure 1: Morpho-Semantic Indexing (MSI) www.morphosaurus.net -> Web Tools -> Morphoogle Contact: Philipp Daumke, Freiburg Medical University, Dept. Medical Informatics, Freiburg, Germany, http://www.morphosaurus.net, email: daumke@coling.uni-freiburg.de