Integrating Greek and English Digital Resources Sean Boisen Computer Assisted Research Section, S19-108 Slides at:

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Toward Making Online Biological Data Machine Understandable Cui Tao.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,
(C) 2000, The University of Michigan 1 Database Application Design Handout #11 March 24, 2000.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Lexical Semantics Chapter 16
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Wordnet - A lexical database for the English Language.
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
An Applied Ontological Approach to Computational Semantics Sam Zhang.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
“So, Brothers”: Pauline Use of the Vocative Steve Runge Sean Boisen Biblical Greek Language and Linguistics.
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Natural Language Processing (NLP)
Element Level Semantic Matching
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Natural Language Processing (NLP)
Dynamic Word Sense Disambiguation with Semantic Similarity
Natural Language Processing (NLP)
Presentation transcript:

Integrating Greek and English Digital Resources Sean Boisen Computer Assisted Research Section, S Slides at:

Outline Motivation and thesis Overview of cross-lingual resources Cross-lingual semantic mapping and applications Conclusions

Motivation Expand the utility of existing Greek resources Open new possibilities for English-oriented students Thesis: integrating English resources can provide additional tools, resources, and insights

Overview of Resources ESV English-Greek Reverse Interlinear New Testament OpenText.Org Syntactically Analyzed Greek New Testament (“OpenText”) Greek-English Lexicon of the New Testament Based on Semantic Domains (“Louw-Nida”) WordNet

ESV Reverse Interlinear Designed to aid English readers in accessing the Greek NT Careful attention to word-level correspondence Acts 21:1

Reverse Interlinear Information Structure Bi-directional lexical mapping Preserves word order in both languages Andwhenwehadpartedfromthemandsetsail ὩςὩςδὲδὲἐ γένετο ἀ ναχθ ῆ ναι ἡμᾶςἡμᾶς ἀ ποσπασθέντας ἀ π’αὐτῶναὐτῶν

Reverse Interlinear Applications Lexicographic distribution –γίνομαι and English translational equivalents – ἔ ρχω: coming vs. going Part-of-speech –ε ὐ θυδρομήσαντες (VPNPMAA) vs. “by a straight course” (Adj) –Distributional analysis Integration of other English resources

Overview: OpenText.org Syntactic annotation of the Greek New Testament –Syntactic groups up to the clause level Acts 21:1

OpenText Applications Word-level alignment to ESV Reverse Interlinear enables integration with English tools and resources –Numerous automated English analytical tools are available –Enables cross-lingual comparison: part-of-speech distribution, syntactic analysis, etc.

Overview: Louw-Nida Domain grouping: –“Object referents” (entities): 1-12 –Events: –“Abstracts”: –Discourse referentials: 92 –Names of persons and places: 93 Sub-domains with hypernyms (“is-a”) and hyponyms

Overview: Louw-Nida (2) Organized semantically into “meaning entries” –groups of terms with a shared sense that are semantically distinguishable from others –Meanings within a sub-domain are ordered: “those meanings which are treated first tend to be of a more generic nature, while more specific meanings follow” (LN Introduction, p. vi) Index of Greek terms to meaning entries Partial index of English terms to sense groups

Louw-Nida Information Structure 6: Artifacts 6.B: Agriculture & Husbandry 6.C: Fishing6.D: Binding and Fastening6.E: Traps, Snares 6.A: General Artifacts Also: Weapons Boats Vehicles For Writing Money For Music Images Lights Furniture And others … 6.23 παγίς “an object used for trapping or snaring, principally of birds” ‘trap’ ‘snare’ 6.24 θήρα “an instrument used for trapping, especially of animals other than birds” ‘trap’ ‘snare’ 6.25 σκάνδαλον “a trap, probably of the type which has a stick which when touched by an animal causes the trap to shut” ‘trap’

Polysemy in Louw-Nida 6978 meanings Greek index –6805 terms –8428 term-meaning pairs English index –4622 terms –9586 term-meaning pairs

Applications of Louw-Nida Semantic concordance Identifying semantic coherence and lexical chains Text similarity assessment Collocation analysis (O’Donnell 2005) Challenges –Shallow hierarchy –Coverage limited to NT only

Overview: WordNet Rich hierarchy of English meanings Organized into synonym sets (synsets) Additional relationships beyond hypernyms –Part/whole (holonym/meronym) –Derivational relationships On-line browser at Using version 3.0 –Python and Natural Language Toolkit (NLTK,

WordNet Information Structure artifact device trap instrumentality, instrument net fishnet, fishing net snare, gin, noose trap (verb) Related-to bait, decoy, lure Has-part

Mapping Louw-Nida to WordNet Extract meaning entries and their hierarchy Extract the term-to-meaning indexes Invert the English index to map LN meanings to English terms Use the English terms to identify a WordNet synsets (or cluster) –More refined approach: Use Logos’ disambiguated annotation of the Greek NT with Louw-Nida data (forthcoming) –Refine this with mappings from ESV Reverse Interlinear

Application: Lexical Chains

Application: Semantic Indexing for Search Use the more refined WordNet hierarchy to provide a richer search interface Example: “addiction” –ESV uses related adjective in 1Tim.3.8, “not addicted to much wine” –Relevant LN meaning for προσέχω is LN Domain Aspect, Subdomain Continue Gloss: ‘to continue to give oneself to, to continue to apply oneself to.’ –“addiction”, “addict”, “addicted” not in the LN English index –Nothing leads directly back to LN.25.A (Desire, Want, Wish) or LN.25.B (Desire Strongly)

English Applications: Search (2) WordNet hierarchy: –Addiction (an abnormally strong craving) Craving (an intense desire for some particular thing) –Desire (the feeling that accompanies an unsatisfied state) –“crave”, “craving” also not in LN English index Though two related terms, νοσέω and ὀ ρέγομαι, also occur in 1Tim and are translated by ESV as “craving” Richer semantic hierarchy “fills in the gaps” –Connects with user interest –Leads back to relevant semantic groups

Integration as a Research Strategy General benefits of Greek and English resource integration –Evaluate Greek results against a larger background –Provide the benefits of Greek scholarship to a wider audience –Extending narrow resources to a broader corpus

Conclusions Cross-lingual integration opens up new possibilities Valuable data resources for empirically-based analysis

References Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. Louw, J. P. and Nida, E. A., editors (1989). Greek-English Lexicon of the New Testament: Based on Semantic Domains. O'Donnell, M. B. (2005). Corpus Linguistics and the Greek of the New Testament.