Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy.

Slides:



Advertisements
Similar presentations
Ontology Assessment – Proposed Framework and Methodology.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Performance Assessment
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Network Summit – Network Tools The Role and Working Methods of Regional Networks in the European Union Colm Mc Colgan General Manager ERNACT Network.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
The Network of Dynamic Learning Communities C 107 F N Increasing Rigor February 5, 2011.
Information Retrieval in Practice
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
University of Jyväskylä – Department of Mathematical Information Technology Computer Science Teacher Education ICNEE 2004 Topic Case Driven Approach for.
Community Manager A Dynamic Collaboration Solution on Heterogeneous Environment Hyeonsook Kim  2006 CUS. All rights reserved.
Overview of Search Engines
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Smart Learning Services Based on Smart Cloud Computing
Web Document Analysis: How can Natural Language Processing Help in Determining Correct Content Flow? Hassan Alam, Fuad Rahman and Yuliya Tarnikova Human.
Nursing Science and the Foundation of Knowledge
Margaret J. Cox King’s College London
Interstate New Teacher Assessment and Support Consortium (INTASC)
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
1 A Local and Remote Radio Frequency Identification Learning Environment Andrew Shields & David Butcher Wireless and Mobility Research Group, Institute.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Gabriella Pardelli, Manuela Sassi, Sara Goggi Istituto di Linguistica Computazionale “Antonio Zampolli”, ILC Consiglio Nazionale delle Ricerche CNR- Pisa,
Human Resource Management Lecture 27 MGT 350. Last Lecture What is change. why do we require change. You have to be comfortable with the change before.
Deploying Trust Policies on the Semantic Web Brian Matthews and Theo Dimitrakos.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Crowdsourcing for R&D InnoCentive Case
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
MIS – 3030 Business Technologies Social Media & Conversation Big Data.
Web-centric Computing: Computing, Hypertext, & the WWW.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Sharing lessons through effective modelling Hilary Dexter University of Manchester Tom Franklin Franklin Consulting.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
Illustrations and Answers for TDT4252 exam, June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
The Descriptive Grammar as a (Meta)Database Jeff Good University of Pittsburgh and Max Planck Institute for Evolutionary Anthropology.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
UML Use Case Diagramming Guidelines. What is UML? The Unified Modeling Language (UML) is a standard language for specifying, visualizing, constructing,
NTU Natural Language Processing Lab. 1 Investment and Attention in the Weblog Community Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen.
GREY LITERATURE AND COMPUTATIONAL LINGUISTICS: FROM PAPER TO NET Claudia Marzi, Gabriella Pardelli, Manuela Sassi Istituto di Linguistica Computazionale.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Elena Tarasheva, PhD New Bulgarian University. Conclusions at last year’s BETA conference.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
What Are the Characteristics of an Effective Portfolio? By Jay Barrett.
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Concept Mapping: A Graphical System for Understanding the Relationship between Concepts. ERIC Digest.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
You Can’t Afford to be Late!
Carla Basili - Luisa De Biagi Carla Basili * - Luisa De Biagi * * IRCrES Institute, Rome (IT) *CNR –IRCrES Institute, Rome (IT) Central Library ‘G. Marconi’,
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
LEMAIA PROJECT Kick off meeting Rome February 2007 LEMAIA: a Project to foster e-learning diffusion Pietro RAGNI LEMAIA PROJECT Rome, 11 april.
Technology-enhanced Learning: EU research and its role in current and future ICT based learning environments Pat Manson Head of Unit Technology Enhanced.
+ Welcome to PAHO/WHO Sustainable Development and Health Toolkit for the UN Global Conference RIO + 20 Welcome to PAHO/WHO Sustainable Development and.
You Can’t Afford to be Late!
Sharing lessons through effective modelling
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
The INTERACT Website: Important source of information for the ETC Community Karen Vandeweghe, Communications Manager, IS Bratislava 27 January 2010.
Presentation transcript:

Claudia Marzi Institute for Computational Linguistics (ILC) National Research Council (CNR) - Italy

The dynamic nature of modern human social interactions, and the increasing capability of wireless and mobile devices for creating and sharing contents, open up the opportunity for a wide dissemination of information through complex knowledge sharing systems. The web offers a steadily increasing availability of ubiquitous accessible information. In this context, social networks can enhance fore-front ideas and highly innovative contents; they offer an enormous potential to transform research, and research results, into a knowledge co-creation process.

To what extent can Social Networks provide a real opportunity for sharing knowledge and generating and disseminating novel information? Can they really be supportive of a steady flow of technical and scholar writing, or do they only provide a general communication channel for ephemeral communication exchanges? Is there a specific added value in the way Social Networking can foster people’s interest in sharing and building information? Is interactive, informal and ubiquitous information exchange developing a new social framework for the creation of public-domain knowledge ?

We suggest that all these questions can be addressed by applying advanced NLP tools for automated content extraction to the analysis of web-based text collections, sampled from both general-purpose and specialized examples of social networks. The Information Extraction literature provides different modes and tools for knowledge acquisition and representation: from highly structured, standardized and objective knowledge information systems based on ontological hierarchies and relations to more dynamic, subjective tools for volatile knowledge representation such as word clouds and concept maps. Technologies in NL understanding offers an objective measure of the information density of a text document or document collection and ways to map out the distribution/development of information. This makes it possible to compare the information structure across texts and get a sense of their level of content sharing and knowledge coherence.

Natural Language Processing tools can augment text documents with layers of mark-up data, making the hidden linguistic structure of the document overtly represented and accessible The input text is segmented down into words and multi-word structures, mutually linked through syntactic relations Salient terms are identified in context, to provide access keys to the basic contents of the document

Linguistically annotated documents provide a jumping-off point for the acquisition of more and more abstract representations of the document content: words are structured into terms, terms are grouped into conceptual classes, concepts are linked together through vertical (taxonomical) and horizontal (ontological) relations. This kind of linguistic information represents the basis of a computational platform for automated content sharing, access and dissemination.

Moreover, through NLP technologies another orthogonal level of linguistic information can usefully be represented: the content accessibility – level of readability of a text on the basis on its processing difficulty This type of analysis allows us to compare the information structure of different text collections and get a precise sense of their level of informativeness. In particular: lexical richness has to do with the lexicon of a text lexical density gives a measure of the rate at which the content is updated

subject-specificity We can identify the most salient terms in a document and the degree of subject-specificity by comparing the frequency distribution with the frequency distribution of same terms in a balanced corpus. syntactic complexity The syntactic complexity can be calculated on the basis of: the average length of clauses the ways words are arranged in context the length of dependency chains the word distance between head and dependant Text excerpts are sampled from: - general-purpose social networks (based on friendship relations and social proximity) - specialized subject-based communities (based on content sharing and supporting relationships)

In the English experiment, we conducted a grammatical and content-word evaluation in 3 different text collections: a sample of messages posted in general-purpose social networks (e.g. Facebook), a sample of message exchanges within subject-based web communities (e.g. LinkedIn), as a base-line, a sample of Grey Literature writings (GL 12 Conference Proceedings). The distribution of terms was comparatively evaluated on the basis of the degree of domain-specificity of terms domain specific terms (single and multiple) Social networks 1.75 Subject-based communities GL Papers Text analysis tools:

A measure of the average syntactic complexity is given, resulting form: lexical rarity of content-words, distribution of part-of-speech tags average length of chains of dependency links average head-complement distance Once more, the 3 text samples, ranked by increasing values of syntactic difficulty, reflect a gradient of content accessibility which appears to mirror the degree of communicative formality (from less formal to more formal) scored in our text types difficulty level Social networks35.30 Subject-based communities GL Papers55.20 Text analysis tools:

Text analysis tools:

In the Italian experiment, we compared the overall levels of lexical coherence in post exchanges of various fb accounts’ contacts based on friendship relations, and of a subject based blogs of CNR intranet. Lexical coherence is automatically estimated by measuring the flow of new lexical items that are incrementally added in a post exchange referring to the same issue -calculated as the number of novel words introduced by each newly posted comment divided by the length of the comment. Social networks show a slower growth of average word frequency; Subject-based writing tends to be lexically more coherent

In case of stronger interaction between medium and content, we observe: domain-specific terms high lexical coherence more levels of syntactic embedding high/medium readability The medium of Social Networks tends to make communication simpler, with: shorter sentences than in traditional texts high/medium-frequency words simpler syntax (one verb per sentence) high readability with no guarantee of knowledge sharing Social networks Subject- based communities

NLP tools for content analysis and Information Extraction establish a direct relation between modes of knowledge creation/sharing and dynamic, incremental approaches to automated knowledge acquisition and representation: they allow us to assess the content of a text in terms of its level of readability, domain-specificity, lexical coherence and density of its conceptual maps; they can be used to measure not only the effectiveness of a text in conveying information but also the extent to which this information is structured in terms of shared knowledge.

Subject based communities, focused on supporting relationships and content sharing, act at the same time as providers and users of all kind of GL materials in a highly distributed and collaborative scenario, and represent a conducive environment for knowledge transfer. They can represent – as collaboration networks – a key element in the advancement and dissemination of knowledge in scientific domains as well as in diverse aspects of everyday human life. General-purpose social networks, reflecting either friendship or superficial relationships, tend to generate ephemeral information and to create a more superficial and mosaic knowledge. Social Networking is a medium with a strong potential, a house of cards powerful and fragile at the same time