Impact of different relation extraction methods on network analysis results Jana Diesner.

Slides:



Advertisements
Similar presentations
International Technology Alliance in Network & Information Sciences Dave Braines, John Ibbotson, Graham White (IBM UK) SPIE Defense Security & Sensing.
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Management, Population and Marketing of institutional repositories / open access journals Iryna Kuchma, eIFL Open Access program manager, eIFL.net Presented.
Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
J.P. Hornak, , 2004 Research Practices http://
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
India Research Lab Auto-grouping s for Faster eDiscovery Sachindra Joshi, Danish Contractor, Kenney Ng*, Prasad M Deshpande, and Thomas Hampp* IBM.
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.
The CEINT Database Sandra Karcher Carnegie Mellon University / CEE To the Nanotechnology Working Group on September.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
SOCIOLOGICAL INVESTIGATION
A Cross-Sensor Evaluation of Three Commercial Iris Cameras for Iris Biometrics Ryan Connaughton and Amanda Sgroi June 20, 2011 CVPR Biometrics Workshop.
Unclassified//For Official Use Only 1 Analysis of Uncertain Data in Text Documents Carnegie Mellon University and DYNAM i X Technologies PI : Jaime G.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
 Copyright 2007 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Research publication & enabling.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Research, Research, Research Understanding the Basics Jim Yonazi, Ph. D The Center for ICT Research and Innovations – C i RI
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
The Scientific Method An approach to acquiring knowledge.
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Computational Impact Assessment of Social Justice Documentaries Jana Diesner, Jinseok Kim, Shubhanshu Mishra, Kiumars Soltani, Sean Wilner, Amirhossein.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
1 The UNCHIKU System A Platform for Collaborative Learning and Knowledge Development with Online Community Mitsuyuki Inaba College of Policy Science Ritsumeikan.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
A multidisciplinary graduate program in the Dietrich School dedicated to Applied Artificial Intelligence (AI) Program Goals Provide an outstanding interdisciplinary.
Systematic Review: Interpreting Results and Identifying Gaps October 17, 2012.
Reference Collections: Collection Characteristics.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Comparison of Fuzzy and Signal Detection Theory L.L. Murphy, J.L. Szalma, and P.A. Hancock Department of Psychology Institute of Simulation and Training.
Audit Evidence Process
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Vertical Interaction in Open Software Engineering Communities Patrick Wagstrom Ph.D. Thesis Defense March 9, 2009 Committee: James Herbsleb Kathleen Carley.
Connect with life Cheryl Johnson VSTS Solution Expert | Canarys Automations Pvt Ltd Performance Testing.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert David F. Redmiles Information and Computer Science.
Writing Research Proposals
Macromolecules Database Creation for Polymer Properties
Jarek Nabrzyski Director, Center for Research Computing
An Artificial Intelligence Approach to Precision Oncology
Section 2: Science as a Process
HSCB Focus 2010 Overview August 5-7, 2009 Chantilly, Virginia
Workflows in archaeology & heritage sciences
Crossing the gap between multimedia data and semantics
Ontology-Based Information Integration Using INDUS System
CSE 635 Multimedia Information Retrieval
Studying politics scientifically
Metadata Development in the Earth System Curator
The Impact of Changes in Network Structure on Diffusion of Warnings
Presentation transcript:

Impact of different relation extraction methods on network analysis results Jana Diesner

Motivation Text DataNetwork DataApplications Need: scalable, reliable, robust methods & tools Unstructured At any scale Network Analysis Answer substantive and graph-theoretic questions Develop and test hypothesis and theories Visualizations Populate databases Input to further computations, e.g. simulations, machine learning

Research Questions and Relevance How do network data and analysis results obtained by using different relation extraction methods compare to each other? Why does it matter? –Increased comparability, generalizability, transparency of methods and tools –Increased control and power for developers and users –Supports drawing of reasonable and valid conclusions

Relation Extraction Methods Proximity-based linkage of nodes Database query Proximity-based linkage of nodes Meta- Data Text, manual (TextM) Text, automated (TextA) Meta-data (META) Subject Matter Experts (SME) Codebook

Data 5 Sudan CorpusFunding CorpusEnron Corpus GenreNewswireScientific Writing s Size80,000 articles56,000 proposals53,000 s SourceLexisNexisCordisFERC/ SEC Time span8 years22 years4 years Text-based networks Article bodiesProject description bodies Meta-data network Index termsIndex terms and collaborators headers Large-scale, over-time, open source data from different domains

Results I 1.Text automated vs. manual: total number of nodes of sub-type “generic” far higher than “specific” –Rethink focus of network analysis: collectives vs. individuals –Importance of detecting unnamed entities 2.Ground truth data (SME) hardly resembled by analyzing text bodies and not at all by meta-data networks –In most ideal case, 50% of nodes and 20% of links 3.Agreement in structure and key entities depends on type of network

Results II 3.Agreement between text-based, and with meta-data depends on type of network Type of Network Text-Based NetworksMeta-Data Network Social networks - Substantial overlap between manual and automated, esp. w.r.t. key players - Localized view on geo- political entities and culture -Major international key players -Small overlap in key entities with text-based networks Knowledge networks - Gist of information in terms of common sense entities - Minimal overlap between manual and automated - Seem more informative (mini-summaries) -Less coreference resolution issues - Minimal overlap with text- based For more complete view, combine automated text-based with meta-data network

Acknowledgements This work was supported by the National Science Foundation (NSF) IGERT , the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD , the Air Force Office of Scientific Research (AFOSR) MURI FA , the Office of Naval Research (ONR) MURI N00014 ‐ 08 ‐ 11186, and a Siebel Scholarship. Additional support was provided by the CASOS Center at Carnegie Mellon University. The views and conclusions contained in this talk are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government. 8 Thank You! Questions, Comments, Feedback: