Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
11 September 2002IR/LM workshop, Amherst1 Information retrieval, language and ‘language models’ Stephen Robertson Microsoft Research Cambridge and City.
Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty.
introduction to MSc projects
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
REFLECTIONS ON NOTECARDS: SEVEN ISSUES FOR THE NEXT GENERATION OF HYPERMEDIA FRANK G. HALASZ.
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Communication Degree Program Outcomes
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
MIS – 3030 Business Technologies Social Media & Conversation Big Data.
 ByYRpw ByYRpw.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Decision Support Systems
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
ARTIFICIAL INTELLIGENCE Human like intelligence Definitions: 1. Focus on intelligent Behaviour “Behaviour by a machine that, if performed by a human.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
The World Around Us and the Media Integrating ICT.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
Cognitive Science and Biomedical Informatics Department of Computer Sciences ALMAAREFA COLLEGES.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt,
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Quick Write Reflection How will you implement the Engineering Design Process with your students in your classes?
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Software Engineering, COMP201 Slide 1 Software Requirements BY M D ACHARYA Dept of Computer Science.
KNOWLEDGE MANAGEMENT UNIT II KNOWLEDGE MANAGEMENT AND TECHNOLOGY 1.
Discourse Analysis Week 10 Riggenbach (1999) Chapter 1 - Quotes.
INTRODUCTION TO APPLIED LINGUISTICS
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
AN INTRODUCTION TO SPOKEN LANGUAGE LG4 Section A.
IB Assessments CRITERION!!!.
MYP Descriptors – Essay Types & Rubrics
European Network of e-Lexicography
Grade 6 Outdoor School Program Curriculum Map
CSE 635 Multimedia Information Retrieval
Search Engine Architecture
Language in the Media Lesson 2.
Presentation transcript:

Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical and Applied Linguistics

Language conveys ideas which are essential in corporate innovation; innovation would be nearly impossible if we did not have language. Language makes us shape and refer to things, events, and concepts. Communication takes place when there is a real information exchange process:  every linguistic choice is necessarily meaningful  language is in this sense a dynamic communicative and interactive process

focus on how words and language structures become vehicle for knowledge generation, and in particular for innovation transfer. focus on lexical creativity, and its dynamic interplay with innovative contexts The communicative ability connects people into information-sharing network and information allows people to expand their knowledge. In this context, our goal is to:

WEB searching WEB searching as default mode for information retrieval, though the main sources of digital information are unstructured or semi-structured text materials WEB The WEB: information exchange; has become a primary meeting place for information exchange; resource has evolved into a primary resource for lexicographers and linguists; Is a source of machine readable texts for corpus linguists and researchers in the fields of Natural Language Processing (NLP), Information Retrieval and Text Mining

Corpus-based studies  context domainpurpose Corpus-based studies  lay emphasis on how word usage and word formation processes conveys differences in context, domain and purpose Large corpora – large and growing repositories - to investigate Large corpora – large and growing repositories - to investigate:  how words are used to describe innovation  how innovation-driven topics can influence word usage and collocational behaviour The meanings of words can change over time/discourse, and words can take on new senses

The meanings of words can change over time, and words can take on new senses Words with emergent novel senses reflect an extension of use from one domain to another Semantically ambiguous words (polysemy) can be disambiguated by defining the context – what people do in spoken language (pragmatic competence) Analysis on language productivity offer the opportunity to investigate how words become vehicle for innovative knowledge generation and transfer

 Bibliographic (domain-specific) database  Type coherent multidisciplinary database Search web engine and large corpus query tools are used:  Generic search engine  Lexicography query system Domain-specific vs. generic-purpose texts offer materials for terminology exploration Genre-oriented and stylistically heterogeneous texts are taken:

Imaging Imaging = brain imaging techniques vs. visual representation Neuro Neuro = referred to diagnosis, medicine and surgery.  Neuroimaging  Neuroimaging  coherent Storage Storage = memory capacity vs. containing units Memory Memory = process of information encoding and retrieving By comparing reference corpora – both specific and generic – with the web, examples of polysemy are given:  coherent retention What about a polysemous term like retention ? What domain does it belong to? What is the meaning and what concepts does it convey?

Lexical co-occurrences and collocations can be of considerable help in retrieving text materials which are relevant to a specific domain of interest; however they are often helpless in telling genres purposes or intended readerships Bibliographic domain-specific databases are developed to offer materials to a specialised readership, thus providing highly- selected, well-targeted documents We investigate the hypothesis that specific word-formation processes and neologisms correlate significantly with text genres and intended readership, thus providing a convenient way to track down well-targeted, highly technical repositories of openly available text materials