Sociopolitical Domain as a Bridge from General Words to Terms of Specific Domains Research Computing Center of Moscow State University NCO Center for Information.

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Why study grammar? Knowledge of grammar facilitates language learning
Richard West Moscow November  WHAT is engineering vocabulary?  WHO should teach it?  HOW do learners learn/teachers teach it?  WITH WHAT? What.
Ontology From Wikipedia, the free encyclopedia In philosophy, ontology (from the Greek oν, genitive oντος: of being (part. of εiναι: to be) and –λογία:
The quest for meaning in language documentation Felix Ameka.
Moscow State University Research Computing Center NCO Center for Information Research University Information System RUSSIA: Database and Value-added Services.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas.
Anna Bogomolova, Tatyana N. Yudina, Oleg Karasev, Ruslan Sennov University Information System RUSSIA: RF Social and Budget Statistics Modules with Research-assisting.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Consortia Portal for Sharing Resources of Russian Libraries Alexander Plemnek, Natalia Sokolova St. Petersburg State Polytechnic University, St. Petersburg,
Slide 1 Sudan Private Sector Forum A Forum to support Sudanese Public – Private Sector Dialogue Prepared by Amin Sid Ahmed- World Bank, PSD Advisor Presented.
Knowledge Organization By C.RANGANATHAN. Basic Concepts and Terminology Subject: Subject refers to ‘an organized systematized body of ideas, whose extension.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Urban Growth and Structure Kreg Walvoord And Hillary Campbell.
Finding Important Music-Related Articles in Social Science Databases: Effective Online Searching in an Interdisciplinary World Darwin F. Scott Assistant.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
1 Russia. 2 Federal State Institution State Registration Chamber with the Ministry of Justice of the Russian Federation
Linguistic modeling of professional terminology Olga Klevtsova, Tyumen State University, Russia.
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
Public Administration Jay Shaftitz & E. W. Russell
Research Papers Locating Your Sources. Two Kinds of Sources Primary source: original text, document, interview, speech, or letter (it is the text itself)
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Semenova Svetlana Stepanovna Director State Scientific Enterprise State Scientific Enterprise «Scientific Research Institute for National Schools of the.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Improving Web Sites with Web Usage Mining, Web Content Mining, and Semantic Analysis Jean-Pierre Norguet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Ontology-based information retrieval of scientific information Natalia V. Loukachevitch Laboratory of Information Resources Analysis Research Computing.
European Studies David Kereselidze European Studies Relatively new field, the origin of which was conditioned by the integration processes.
Current Events and Issues Using Index Databases for Finding Answers.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Common features for the subject area report Business and Management LEFIS Continuing education Malta, 8th April 2006.
© 2005 Brenda RogersIEP Game Patent Pending #9014.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.
Transboundary Trust Space February 16, 2012 Ensuring trust in information exchange – proposal and approaches from Russia and CIS-states (RCC states) National.
Sergey Gromov Yulia Krasilnikova Vladimir Polyakov (NRTU MISIS, Moscow) KNOWLEDGE BASE CREATION FOR NATIONAL NANOTECHNOLOGY NETWORKS «CONSTRUCTIONAL NANOMATERIALS»
Wordnet - A lexical database for the English Language.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Cross Cultural Communication
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Three indexes: Social Science Citation Index Index to Legal Periodicals Index to Foreign Legal Periodicals.
Zdroje jazykových dat Word senses Sense tagged corpora.
Common features for the subject area report Law and Policy LEFIS Continuing education Malta, 8th April 2006.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
The concept of prosecutorial independence merits and limitations Mrs. Laura Codruţa KÖVESI Prosecutor General of Romania.
CONSTITUTIONAL LAW OF FOREIGN COUNTRIES. THE CONCEPT, OBJECTS AND METHODS OF LEGAL REGULATION OF CONSTITUTIONAL LAW IN FOREIGN COUNTRIES  Constitutional.
(Click to advance the presentation.). The best source for locating these articles is the collection of research databases at the Online Library. While.
16 Career Clusters Spring 2016.
In search of single definition for primary European road network The existing European networks A paper to start discussion TD Management Project group.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
CMNS 261 Finding Public Policy Documents
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing a Dynamic IP System in the Republic of Belarus
Chapter III: Terminology and Arabization: Problems of Multiplicity and Methodology Part 1.
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Comparing Two Thesaurus Representations for Russian
TERMINOLOGY AND TRANSLATION
Introduction of KNS55 Platform
Presentation transcript:

Sociopolitical Domain as a Bridge from General Words to Terms of Specific Domains Research Computing Center of Moscow State University NCO Center for Information Research Natalia V. Loukachevitch, Boris V. Dobrov

General Words and Terms in Automatic Text Processing Texts in electronic collections contain as general words as terms Two different research domains: lexicology and terminology Wuster (founder of Vienna school of terminology): terminologists begin consideration from a concept, but lexicologists from a form of a linguistic expression

Wuster: difference between lexicological and terminological approaches terminological research starts from the concept which has to be precisely delimited in terminology concepts are considered to be independent from their designations terminologists talk about ‘concepts’ while linguists talk about ‘word meanings’

Construction of Wordnets and Terminology Research Development of wordnets: – Construction of hierarchical semantic networks – Search for similar “synsets” for different languages – building the top ontology of language-independent concepts Approaches to study of general words and terms become closer

Theory of Terminology: Properties of Ideal Term  the term must relate directly to the concept. It must express the concept clearly,  there should be no synonyms where absolute, relative or apparent,  the contents of terms should be precise and not overlap in meaning with other terms,  the meaning of the term should be independent of context.

Theory of terminology: serious difference between a general word and a term biunivocal relationship between concepts and terms in each special field of knowledge For a terminology nothing could be better than that: no synonymy, no homonymy and no polysemy A huge gap between general words and terms BUT!

Term Formation and Words of General Language A general sense of a word and a terminological senses of a word are really different: “function” as a general word, “function” in mathematics, “function” in biology Cruse: “senses of a lexical form are antagonistic to one another; that is to say, they can not be brought into play simultaneously without oddness”

A word and a term are very similar in meaning arson - Law. the malicious burning of another's house or property, or in some statutes, the burning of one's own house or property, as to collect insurance (Random House Unabriged dictionary) A general dictionary uses a very strict definition

How to distinguish terminological and general senses Teacher in court accused of school arson A teacher charged with setting fire to a West Yorkshire school has appeared in court. Amina Ditta, 23, of Scholemoor Road, Bradford, has faced the city's magistrates court charged with one count of arson. The charge relates to an incident last Wednesday at Atlas Primary School in Manningham, where Ms Ditta was employed. She spoke only to confirm her personal details and was represented by barrister Mr Narinda Sekhon. She was granted conditional bail to return to court on June 12. (

Traditional point of view: definitions Traditional terminologists: definitions of terms are strict in comparison to glosses of general words Contemporary point of view: degree of vagueness in term definitions is lower, but in many cases it is inevitable. Taxation in Russian legislation: New construction vs. repair

How many general and terminological senses are so close? - 1 Building - relatively permanent enclosed construction over a plot of land, having a roof and usually windows and often more than one level, used for any of a wide variety of activities, as living, entertaining, or manufacturing (Unabridged Webster dictionary) Domains –Construction industry –Domain of public utilities It is impossible to separate senses Practically all denotations are the same

How many general and terminological senses are so close? - 2 transportation means, job positions, technical devices, food, agricultural plants and animals other natural objects, art work and others – – Produced by professionals – we use them in everyday life social, political and economic processes – planned or restricted by professionals, –our life is influenced by them

General words and terminologies Intersection is significant Number of words in general dictionaries percents belong to the intersection area We call this intersection area -- socio-political domain -- domain of social life -- it describes everyday life of contemporary society

The sociopolitical domain and domains in WordNet Many researchers proposed sets of domains for WordNet and EuroWordNet The sociopolitical domain is approximately equal to sum of the proposed domains A synset is related to the sociopolitical domain if there is a professional domain (not science) that has a term with very similar sense (+- vagueness) Emotions and feelings do not belong to the sociopolitical domain

Multiword terms from specific domains A lot of multiword terms from professional domains are understandable to native speakers –Multinational country –Single member constituency –Amicable agreement –Global market –Criminal omission Special criteria for inclusion of multiword expressions

Features of Sociopolitical Domain-1 Texts of various genres – official documents, international treaties, legislative documents, newspaper articles are related to the sociopolitical domain. Development of a unified linguistic resource for automatic text processing of such various texts A broad basis for development of domain- specific resources

Features of Sociopolitical Domain-2 Inclusion of multiword terms facilitates disambiguation procedures Ambiguity within the domain is much lower than in the whole resource, distinctions between senses are more definite and more important – it is possible to use different disambiguation procedures within the sociopolitical domain and out of the domain Procedures of identification of lexical cohesion, lexical chains can be also different for synsets in the sociopolitical area and out of it, because of more thematic definiteness of concepts in the sociopolitical domain (“privatization” vs. “creation”)

Experience of Work in Sociopolitical Domain Project University Information System RUSSIA ( – 800 thousand Russian Documents (after 1991) Russian thesaurus on Sociopolitical life (since 1994) – concept-based network of 30 thousand concepts, 75 thousand words and terms Automatic text processing since 1995 – text categorization, automatic conceptual indexing, text summarization

SourceRetrospectiveDocuments Legal documents Official Publication Coverage 1990-…55,000 State Duma daily records State Duma1994-…100,000 StatisticsState Statistics Agency; CIS Interstate Statistics Committee 1998-…20,000 Mass mediaExpert weekly; Nezavis. gazeta; Izvestia; … 199(7)-…180,000 Analytical reports Central Bank of RF; Rus.-Europ.Center for Economic Policy; … 1996-…10,000 Scientific publications MSU Publishing, RePEc, “Sociology Research”, … 1999-…2,000 (+230,000ref) University Information System RUSSIA ( 800,000/ 7.5Gb

Socio-Political Domain vs. Lexicon Socio- Political Domain Lexicon Sciences Levels of Hierarchy 110,000 text entries 50,000 concepts 75,000 text entries 30,000 concepts

Specific Domains vs. Socio-Political Socio-Political Domain Levels of Hierarchy Elections Industrial Production Geography

Interrelations between Socio-Political Domains Socio-Political Domain Levels of Hierarchy Law Accounting Taxation Banking

Socio- Political Domain Social Sciences Natural Sciences Sciences vs. Socio-Political Domain

Specific applications of Sociopolitical thesaurus Terms of economics and sociology were included – automatic text categorization of scientific papers (700 Categories – JEL (Journal of Economic Literature subject headings) Terms of non-production spheres were added – automatic text categorization of Russian legislation (3000 categories of the commercial subject headings system)

Conclusions-1 A border between a general language lexicon and terminologies of specific domains is not sharp and abrupt. It looks more like a broad strip and contains general language senses practically coinciding with concepts of social subdomains and concepts of specific domains understandable for native speakers

Conclusions-2 Detailed description of concepts, terms, words from this “transition area”, called “sociopolitical domain”, can be naturally added to a wordnet’ semantic network and facilitate solution of such problems as lexical disambiguation and identification of the text structure, enhance coverage of domain-specific texts by wordnets’ synsets, improve effectiveness of the wordnets use in various automatic text processing applications