School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
An Introduction to GATE
Improving Machine Translation Quality with Automatic Named Entity Recognition Bogdan Babych Centre for Translation Studies University of Leeds, UK Department.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
1 Welcome & Overview 2 nd Annual Workshop “What are National Security Threats?” Kathleen D. Morrison Co-Director, JTAC Professor of Anthropology Director,
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Ontology-based Information Extraction for Business Intelligence
Metaphor Analysis in Social Science: The problem Lynne Cameron and Rob Maslen.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
 What is the role of the media and the Internet in facilitating terrorist radicalization?
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
NERIL: Named Entity Recognition for Indian FIRE 2013.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Overview of technologies for translators and language service providers Belinda Maia University of Porto.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
STRATEGIC INTELLIGENCE MANAGEMENT Chapter by Fahimeh Tabatabaei, Reza Naserzadeh, Simeon Yates, Babak Akhgar, Eleanor Lockley, David Fortune Chapter 8.
Generalising lexical translation strategies for MT using comparable corpora Bogdan Babych, Serge Sharoff, Anthony Hartley Centre for Translation Studies,
Sensitivity of automated MT evaluation metrics on higher quality MT output Bogdan Babych, Anthony Hartley Centre for Translation.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Emerging Technologies & Language FET-Open The European Future and Emerging Technologies Open Scheme FIL2010 Louvain-La-Neuve, March 17 th 2010 Paul Hearn.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
COMPUTER SYSTEM FUNDAMENTAL Genetic Computer School INTRODUCTION TO ARTIFICIAL INTELLIGENCE LESSON 11.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Book web site:
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Centre for Translation Studies FACULTY OF ARTS
TextCrowd – Collaborative semantic enrichment of text-based datasets
Deep Exploration and Filtering of Text (DEFT)
Action IC0603 Antenna Systems & Sensors for Information Society Technologies (ASSIST) Participating countries: BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
--Mengxue Zhang, Qingyang Li
Social Knowledge Mining
Writing Analytics Clayton Clemens Vive Kumar.
Machine Translation(MT)
PURE Learning Plan Richard Lee, James Chen,.
Presentation transcript:

School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual Information Extraction framework for real-time detection of terrorist propaganda threats in on-line communication Bogdan Babych Centre for Translation Studies XI International Conference “Military education and science: the present and the future” Military Institute of Taras Shevchenko National University, Kyiv, Ukraine, 27 November 2015 Eric Atwell Artificial Intelligence Research Group

Overview NLP for detection of direct terrorist threats is not enough Propaganda treats: radicalization, recruitment, justification State propaganda as an extension of ‘soft power’ used as a military instrument EU Horizon2020 proposal: automated real-time multilingual detection of security & terrorist propaganda threats Technologies: Machine Translation (MT) + Information Extraction (IE) Innovative challenges: IE template filling task for propaganda messages Exploitation: community intelligence and response development Future work: technological outlook & invitation for collaboration

Natural Language Processing (NLP) for direct threat detection is not enough NLP techniques for Traditionally: identification of direct terrorist threats Focus on illegal activities (planned attacks) Discovering actionable information preventing an attack uncovering a network Alerts for analysts about suspect communication Database of connected facts Intelligent decision-support systems US DARPA DEFT project: own-deep-learning-project-for-natural-language- processing/ own-deep-learning-project-for-natural-language- processing/ UK IDEAS Factory - Detecting Terrorist Activities: Making Sense (included Leeds team), EPSRC/ESRC/CPNI P/H023135/1 P/H023135/1

Natural Language Processing (NLP) for direct threat detection is not enough Problem: propaganda not captured by traditional direct threat detection Terrorist propaganda, fundamentalist radicalization not strictly illegal Increasingly used by terrorist groups & states-sponsors of terrorism for: [Radicalization]  [Recruiting fighters] Creation of local cells, ‘5 th column’ Ideological justification of causes for terrorism, manipulation of public opinion Crowdsourcing political influence: ‘soft power’ turned ‘hard’ military instrument State propaganda targets international public opinion and political decisions Has direct military consequences

Computational Linguistics in propaganda wars: tasks of creating and countering propaganda In Russia – at least since 2004: evidence of funding research on linguistic means for manipulating public opinion Models based on Melchuk’s ‘Meaning  Text Theory’

Technologies rely on combination of: Machine Translation (MT): Statistical+Rule-Based=Hybrid Linguistic features for Part-of-Speech Tagging + Lemmatization Parsing (string-to-tree MT) Information Extraction (IE) from MT-translated texts (en) Named Entity recognition (Person, Organization, Location… names) Scenario template filling (Detection of Events, Relations, Participants) Text similarity detection: e.g., lexical overlap (L) + structure (S) + keywords (K) + named entities (N) (Su and Babych, 2012) Computational Linguistics in propaganda wars: tasks of creating and countering propaganda

Technologies for Text and Speech processing (propaganda sites) Statistical / Hybrid MT Open-source‘Moses’ decoder Euronews site dump ~ (ar, de, en, fr, gr, hu, it, pe, pt, ru, tr, uk) Plain text extraction & tokenization; Hunalign sentence alignment ign/ ign/ Part-of-speech tagging (for factored models: lemma/PoS/word) TnT saarland.de/~thorsten/tnt/ + parameter files saarland.de/~thorsten/tnt/ Leeds MT system (file translation): ar- en, fr-en, es-en, de-en, ru-en, uk-en file.html file.html Statistical decoder Phrase Table (Translation Model) ST TT Parallel texts (translat ions) Target texts Parallel texts (translat ions) training Target Language model training Target texts Linguistic features & analysis

Technologies for Text and Speech processing (propaganda sites) Information Extraction (IE) Identification of relevant information, NOT full text understanding Scenario template filling task = structured database of events from text GATE ANNIE: NER + Co-reference Scenario Template Filling Ontology PoS Tagging + chunking + Named Entity recognition + co-reference resolution System used: GATE (University of Sheffield) Traditionally: for direct threat detection

Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response

Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling ru-en MT

Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

Sensitivity to MT quality for Organization Names, scenario template filling Precision OK; Recall goes Solution: adapting MT to IE? Challenge: Information Extraction from MT output

Future work Invitation for collaboration ( Community response to propaganda threats Beyond security analysts: anti-terrorist volunteers and crowd intelligence Automatic creation of IE propaganda templates Template similarity and event similarity detection; argumentative texts Learning defense and security ontologies from corpora Automated reasoning using ontologies (predicate & description logic) Modeling language distortion for real-world communication Dialectal, graphical variation, misspelling, abbreviations MT and IE for non-literal language usage metaphors, euphemisms, indirect references