Cognitive Architecture for Reasoning about Adversaries T-REX: A Domain-Independent System for Automated Cultural Information Extraction Massimiliano Albanese.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Chapter 10: Designing Databases
A Stepwise Modeling Approach for Individual Media Semantics Annett Mitschick, Klaus Meißner TU Dresden, Department of Computer Science, Multimedia Technology.
Semantics Static semantics Dynamic semantics attribute grammars
Database Systems: Design, Implementation, and Management Tenth Edition
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Search Engines and Information Retrieval
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE
ADVISE: Advanced Digital Video Information Segmentation Engine
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Mark Graves Leveraging Existing DBMS Storage for XML DBMS.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Methodology Conceptual Database Design
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
LECTURE 2 DATABASE SYSTEM CONCEPTS AND ARCHITECTURE.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Survey of Semantic Annotation Platforms
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Components of Database Management System
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Crawlers - Presentation 2 - April (Web) Crawlers Domain Presented by: Or Shoham Amit Yaniv Guy Kroupp Saar Kohanovitch.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Chapter 19 Implementing Trees and Priority Queues Fundamentals of Java.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
COMU114: Introduction to Database Development 1. Databases and Database Design.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engine Architecture
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Design a full-text search engine for a website based on Lucene
1 Object Oriented Logic Programming as an Agent Building Infrastructure Oct 12, 2002 Copyright © 2002, Paul Tarau Paul Tarau University of North Texas.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Accessing XML Documents Using DOM ©NIITeXtensible Markup Language/Lesson 8/Slide 1 of 23 Objectives In this lesson, you will learn to: * Use XML DOM objects.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Maitrayee Mukerji. INPUT MEMORY PROCESS OUTPUT DATA INFO.
Rule-based Reasoning in Semantic Text Analysis
Database Systems: Design, Implementation, and Management Tenth Edition
Search Engine Architecture
MANAGING DATA RESOURCES
Analysis models and design models
A framework for ontology Learning FROM Big Data
Presentation transcript:

Cognitive Architecture for Reasoning about Adversaries T-REX: A Domain-Independent System for Automated Cultural Information Extraction Massimiliano Albanese V.S. Subrahmanian University of Maryland Institute for Advanced Computer Studies College Park, Maryland, USA

2 Cognitive Architecture for Reasoning about Adversaries Introduction  Several applications require the ability to extract fine-grained information from huge text collections » Intelligence agencies may need detailed information about diverse cultural groups around the world in order to understand and model their behavior » A real-time “violence-watch” around the world would require the ability to identify several attributes for every “violent event” reported in the online press  Traditional search engines » Are not able to provide such information without sorting through a long list of documents » Are not able to integrate information from different sources

3 Cognitive Architecture for Reasoning about Adversaries Key contributions  Domain-independent framework for information extraction » A schema describing the information the user wants to extract is provided as an input  Key features » Scalability: the system is designed to massively scale to large volumes of data It currently searches through 109 online news sites from 66 countries around the world, processing about 45,000 articles/day (about 10 millions distinct urls explored so far, with 7 millions triples extracted) » Multilingual support: the system is designed to work with different languages English, Spanish and Chinese » Flexibility: several elements can be easily customized List of sources, topics of interest, type of information to extract

4 Cognitive Architecture for Reasoning about Adversaries T-REX architecture Crawling and parsing

5 Cognitive Architecture for Reasoning about Adversaries Multilingual Annotation Interface Sentence being annotated Parse tree edit panel List of triples that can be extracted from the sentence Constraint selection panel

6 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Motivation  The same fact can be reported in many slightly different ways » At least 73 civilians were killed February 1 in simultaneous suicide bombings at a Hilla market » More than 73 civilians were massacred in February in suicide attacks at a Hilla marketplace » 74 people were killed on February 1, 2007 in multiple bombings at a Hilla market  Other similar events may be reported through similar sentences, describing the same set of attributes » About 23 U.S. soldiers were killed in August 2005 in a suicide attack in Baghdad  Sentences describing the same type of fact in slightly different ways can be grouped into a single class » Learning an “extraction rule” for each class of interest to a given application enables to extract the desired information from any article

7 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Step 1 The annotator is presented with one or more parse trees for the sample sentence At least 73 civilians were killed February 1 in simultaneous suicide bombings at a Hilla market

8 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Step 2 The annotator marks as “variable” all the nodes that may have different text in other sentences of the same class

9 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Step 3 If needed, the annotator add constraints to variable nodes

10 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Constraints  IS_ENTITY » restricts a noun phrase to be a “named entity”  IS_DATE » restricts a noun phrase to be a temporal expression  X_VERBS » restricts a verb to be any member of a class X of verbs e.g. the constraint MURDER_VERBS requires a verb to be any of the following: kill, assassinate, murder, execute, etc.  X_NOUNS » restricts a noun to be any member of a class X of nouns e.g. the constraint ATTACK_NOUNS requires a noun to be any of the following: assault, attack, clash, etc.

11 Cognitive Architecture for Reasoning about Adversaries Annotation Process: Step 4 The annotator describes the semantics of the annotated sentence in term of triples, mapping attributes to variable nodes

12 Cognitive Architecture for Reasoning about Adversaries Annotations in Multiple Languages English Chinese simplified ( 中文 ) Spanish (Español)

13 Cognitive Architecture for Reasoning about Adversaries Rule Extraction Engine  An extraction rule is of type Head  Body  A rule is learned through the following steps » abstraction each variable node is assigned a numeric identifier, its text and child nodes are removed ›the model becomes independent of the particular sentence » body definition the body of the rule is built by serializing the parse tree of the annotated sentence in Treebank II Style » head definition the head is defined as a conjunction of RDF statements, one for each triple defined in the last step of the annotation process

14 Cognitive Architecture for Reasoning about Adversaries Rule Matching Engine (1/2)  Extracts RDF triples, by matching sentence from texts being analyzed against the set of extraction rules Continuously fetches documents relevant to the application of interest If the parse tree of a sentence satisfies the condition in the body of a rule an RDF triple is instantiated for each statement in the head of the rule CompareNodes() determines if the parse tree of a sentence satisfies the condition in the body of a rule

15 Cognitive Architecture for Reasoning about Adversaries Rule Matching Engine (2/2)  CompareNodes() recursively explores the parse tree of the sentence being processed and the annotated parse tree of a rule Checks satisfaction of constraints for variable nodes Checks constant nodes Pairwise compares child nodes of non terminal nodes

16 Cognitive Architecture for Reasoning about Adversaries Example of Matching Var#1 = “About 23” Var#2 = “U.S. soldiers” Var#3 = “were” Var#4 = “killed” Var#5 = “August 23” Var#6 = “a suicide attack” Var#7 = “Baghdad” (KillingEvent9,victim,U.S. soldiers) (KillingEvent9,numberOfVictims,about 23) (KillingEvent9,date,August 23) (KillingEvent9,location,Baghdad) The sentence satisfies the body of the rule e.g. “About 23 U.S. soldiers were killed August 23 in a suicide attack in Baghdad”

17 Cognitive Architecture for Reasoning about Adversaries Example of extracted data (1/2) At least 22 Hindus were killed by suspected Muslim militants in India's Jammu and Kashmir state Monday, the police said Event data

18 Cognitive Architecture for Reasoning about Adversaries Example of extracted data (2/2) Link depth 2 from Pushtuns

19 Cognitive Architecture for Reasoning about Adversaries T-REX implementation  The implementation of T-REX consists of several components running on different nodes of a distributed system » Multilingual Annotation Interface: web-based tool, that is part of the web interface of T-REX (implemented as a Java Applet) » Annotated RDF Database System for storage of annotated RDF triples: the underlying relational DBMS is PostgreSQL 8.2 » Rule Matching Engine: a pipeline of several components Crawler: explores news sources for relevant documents Parsers for every language: process sentences from relevant documents, producing constituent trees in Treebank II Style Extractor: implements the Rule Matching Engine logic  Distribution, Database Partitioning, and Multithreading ensure scalability

20 Cognitive Architecture for Reasoning about Adversaries Conclusions  We have presented a general, multi-lingual and flexible framework for information extraction » Domain specific application are enabled by targeting the extraction to the instantiation of a schema of interest » Addition of other languages is a relatively simple task, once a set of linguistic resources are available for those languages  We have implemented a complex prototype that has proved to » effectively extract information for different applications » scale massively  Future efforts will be devote to » define pruning strategies to make the extraction process faster » define strategies to manage inconsistencies in the extracted data » extend the system to other languages (mainly Asian languages)