Multilingual Event Detection in the Europe Media Monitor

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Wednesday 13 April /02/ :58 European Union: Keeping up-to-date Eva Koundouraki Information Specialist, European Union EUI Library
Real Time Information.
Kien A. Hua Division of Computer Science University of Central Florida.
By Thomas Reamer. My Opinion Citizens should not have the right to purchase and own assault weapons. With the exception of soldiers serving in our military.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
ELECTORAL RISK MANAGEMENT POST ELECTION REPORT ON 2015 GOVERNORSHIP AND STATE ASSEMBLY ELECTIONS Prepared by: INEC Electoral Risk Management (ERM) Team.
Crisis Mapping Cleveland OH 1 From Prevention and Preparedness to Response and Recovery: The GlobeSec Approach Brian Doherty European Commission.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Crisis Mapping Cleveland OH 1 Integrating Event Detection, Analysis and Response: The GlobeSec Approach Brian Doherty European Commission - Joint.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
1 Allen Sears CNRI TIDES Kick-off 22 March, 2000 Military Relevance for TIDES.
Regional Licensing and Availability Zina Neophytou Travel Distribution Director BBC Worldwide Global Channels 23rd May 2007.
Weapons Effects. Overview Epidemiology of Injuries Mechanism of Injury Antipersonnel Landmines Small Arms.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Economic Development for the DFW Metroplex Related to Security: An Academic Perspective Dr. Bhavani Thuraisingham The University of Texas at Dallas December.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Terrorism & Bioterrorism Communication Challenges Module 9.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Processing of large document collections Part 1 (Introduction) Helena Ahonen-Myka Spring 2006.
Topic: Opinion Extraction and Summarization. Opinion Extraction and Summarization What follows: perspective of Cardie, Riloff, Wiebe We can see similar.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Web Intelligence Clive Best Toulouse. Europe Media Monitor 24/7 real time monitoring of 400 on-line media Process “live” articles from web sites. Supports.
At Least 12 Dead in Attack on French Newspaper. A local resident distributes coffee to reporters gathered at the scene after gunmen stormed a French newspaper,
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
Alan P. Janssen, MSPH Health Communication Specialist National Centers for Immunization and Respiratory Diseases Centers for Disease Control and Prevention.
The Role of Semantics and Terminologies in a Service-Oriented Architecture Paul Smits, Michael Lutz European Commission – DG Joint Research Centre Ispra,
1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
International Organizations. With a partner… Make a list of the issues or problems that you know of that are happening around the world.
Unit 4: An Introduction to the Author & Steve’s World.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Survey on Different Data Mining Techniques for E- Crimes
Measuring Monolinguality
Sentiment analysis algorithms and applications: A survey
Usage scenarios, User Interface & tools
Security Issues Formalization
Ballistics.
Data and Applications Security Developments and Directions
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Natural Language Processing (NLP)
Epidemic Alerts EECS E6898: TOPICS – INFORMATION PROCESSING: From Data to Solutions Alexander Loh May 5, 2016.
Outbreak 2-D Game Design
Social Knowledge Mining
Phil Durrant Debra Myhill Mark Brenchley
Extracting Semantic Concept Relations
CS 3304 Comparative Languages
Ballistics Chapter 15.
Public Health Surveillance
Text Mining & Natural Language Processing
Chapter 4 Hazards.
Ying Dai Faculty of software and information science,
Text Mining & Natural Language Processing
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
“Oceans May Host Next Wave Of Renewable Energy”
PURE Learning Plan Richard Lee, James Chen,.
Natural Language Processing (NLP)
Chapter 10: Compilers and Language Translation
Building Topic/Trend Detection System based on Slow Intelligence
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
International Organizations
Dr. Abdulmonem Al-Hayani MBChB, DipFMS(Lon), PhD(Aber)
Natural Language Processing (NLP)
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Multilingual Event Detection in the Europe Media Monitor Hristo Tanev Joint Research Centre, European Commission

Europe Media Monitor (EMM) Europe Media Monitor – a multilingual news monitoring platform of the European Commission EMM processes up to 70 languages EMM finds entities and trends inside the news

Europe Media Monitor (EMM) EMM is being used by the analysts of the European Commission and other international organizations and member states EMM gathers around 220,000 news per day from 5000 online sources The news represent the major newspapers from all countries in the World with strong focus on Europe

Europe Media Monitor (EMM) We have 56 news sources from Switzerland (In comparison we have 53 from Austria) We monitor also social media (mostly Twitter) for recurrent hash tags, names and shared links

EMM applications NewsBrief – collects and clusters news in 70 languages in various topics MediSys – collects news about public health-related issues NewsExplorer – provides long-term view of the news NewsDesk – tool for human moderation EMM Apps – mobile app interfacing EMM Citizen and Science – a project which collects news from the area of science and technology

Event Detection in EMM There are two information extraction technologies, implemented in EMM: Named Entity Recognition Event Metadata Extraction We detect and extract metadata from articles from the area of security, disasters and health

NEXUS - a Multilingual System for Event Detection NEXUS has been used in the African Union headquarters, in the European Agency for Border Control Frontex. It is also part of the Europe Media Monitor

NEXUS – a Multilingual Event Extraction System NEXUS is software system which has been designed to work in multilingual settings It is rule based – exploits local grammar rules to perform syntactic and semantic analysis of the text Uses a large system of keyword combinations to classify the events into categories

Violent crimes Socio political Crisis event Disaster Security-related Humanitarian crisis Natural Disaster Manmade disaster/ accident Arrest Trial Execution Kidnapping/Hostage taking Hostage release Hostage video Release Violent crimes Robbery Shooting Physical attack Medical events Socio political Protest Riots Strikes Military event Terrorist attack Armed conflict Heavy weapons Air attack

Extracted Events

Real Time news clustering Cross lingual Clustering RSS Event Extractor Real Time news clustering Typically last 8 hours of RSS per language RSS Cache Stories Summaries Entity Recogniser Cross lingual Clustering Breaking news, based on cluster growth Mailer/SMS RSS+ <text> <entity> <geo> <quote> <tonality> <category> duplicate= Continuously updated RSS EMM Pipeline

NEXUS Pipeline

Tokenization and sentence splitting are done via an in-house system for shallow processing Sentence splitter uses a list of words like Dr. or Prof. which cannot appear at the end of a sentence

NEXUS Using finite state grammars the system detects in the text information about Killed Injured Perpetrators Arrested Displaced people Targets of attacks – vehicles, buildings, people Weapons

Two-level finite state grammar

Automatic Learning of Lexica Multilingual weakly supervised lexical learning tool EMM Terminology Discovery The tool uses distributional semantics The user inserts on the input a handful of seed words, which belong to a certain semantic category and the system finds many more terms

Automatic Learning of Lexica (example) For example, if we want to learn a list of weapons, the input seed set can be: pistol gun rifle The system expands this small seed set into a big set of words: handgun, firearm, knife, shotgun, weapon, revolver, guns, machine gun, pistols,rifles,bullet,….

Automatic Learning of Lexica (example)

Semi-Automatic Learning of Grammar Rules for Event Extraction Extract domain-specific terminology Find word clusters, using distributional similarity Find cluster co-occurrences

Semi-Automatic Learning of Grammar Rules for Event Extraction Automatically extracted terms, co-occurring with killed and injured: rioting, bomb blast, gunfight, bomb…., gunfire, enemy fire, sniper, gunshots, sniper fire, stray bullets, bullets,…,paramilitary troopers, militiamen, federal agents, soldiers, gendarmes, troops,…,

Semi-Automatic Learning of Grammar Rules for Event Extraction Extracted terms are clustered, using distributional similarity Patterns of cluster co-occurrences are found inside a corpus [troopers, soldiers, militiamen,...] were [killed,wounded, dead,…], troopers were dead

Semi-Automatic Learning of Grammar Rules for Event Extraction [killed,wounded,dead,...] in [rioting, bomb blast, gunfight,...], killed in bomb blast [detonated, exploded,...] the [landmine, bomb, bombs,…], detonated the landmine

Medical Event Extraction The purpose of the system for medical event extraction is detection of disease outbreaks and the people who are dead and infected in these outbreaks The system uses grammar rules and extensive list of infectious diseases

Medical Event Extraction One of the challenges in this domain of application of EE is that multiple stages of the outbreaks can be reported Currently, the system recognizes different temporal expressions in the text and finds the event descriptions anchored to those temporal expressions

Multiple Event Detection Taipei, May 22 (CNA) A new imported case of measles has been confirmed in Taipei after a man tested positive for the virus, the Centers for Disease Control (CDC) said Tuesday…So far this year, there have been 25 confirmed cases of measles infection, the highest recorded number in Taiwan in the last seven years, according to the latest CDC data.

http://emm.newsbrief.eu