Download presentation
Presentation is loading. Please wait.
Published byElvin Hubbard Modified over 8 years ago
1
School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual Information Extraction framework for real-time detection of terrorist propaganda threats in on-line communication Bogdan Babych Centre for Translation Studies b.babych@leeds.ac.uk b.babych@leeds.ac.uk XI International Conference “Military education and science: the present and the future” Military Institute of Taras Shevchenko National University, Kyiv, Ukraine, 27 November 2015 Eric Atwell Artificial Intelligence Research Group e.s.atwell@leeds.ac.uk e.s.atwell@leeds.ac.uk
2
Overview NLP for detection of direct terrorist threats is not enough Propaganda treats: radicalization, recruitment, justification State propaganda as an extension of ‘soft power’ used as a military instrument EU Horizon2020 proposal: automated real-time multilingual detection of security & terrorist propaganda threats Technologies: Machine Translation (MT) + Information Extraction (IE) Innovative challenges: IE template filling task for propaganda messages Exploitation: community intelligence and response development Future work: technological outlook & invitation for collaboration
3
Natural Language Processing (NLP) for direct threat detection is not enough NLP techniques for Traditionally: identification of direct terrorist threats Focus on illegal activities (planned attacks) Discovering actionable information preventing an attack uncovering a network Alerts for analysts about suspect communication Database of connected facts Intelligent decision-support systems US DARPA DEFT project: https://gigaom.com/2014/05/02/darpa-is-working-on-its- own-deep-learning-project-for-natural-language- processing/ https://gigaom.com/2014/05/02/darpa-is-working-on-its- own-deep-learning-project-for-natural-language- processing/ UK IDEAS Factory - Detecting Terrorist Activities: Making Sense (included Leeds team), EPSRC/ESRC/CPNI http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=E P/H023135/1 http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=E P/H023135/1
4
Natural Language Processing (NLP) for direct threat detection is not enough Problem: propaganda not captured by traditional direct threat detection Terrorist propaganda, fundamentalist radicalization not strictly illegal Increasingly used by terrorist groups & states-sponsors of terrorism for: [Radicalization] [Recruiting fighters] Creation of local cells, ‘5 th column’ Ideological justification of causes for terrorism, manipulation of public opinion Crowdsourcing political influence: ‘soft power’ turned ‘hard’ military instrument State propaganda targets international public opinion and political decisions Has direct military consequences
5
Computational Linguistics in propaganda wars: tasks of creating and countering propaganda In Russia – at least since 2004: evidence of funding research on linguistic means for manipulating public opinion Models based on Melchuk’s ‘Meaning Text Theory’
6
Technologies rely on combination of: Machine Translation (MT): Statistical+Rule-Based=Hybrid Linguistic features for Part-of-Speech Tagging + Lemmatization Parsing (string-to-tree MT) Information Extraction (IE) from MT-translated texts (en) Named Entity recognition (Person, Organization, Location… names) Scenario template filling (Detection of Events, Relations, Participants) Text similarity detection: e.g., lexical overlap (L) + structure (S) + keywords (K) + named entities (N) (Su and Babych, 2012) Computational Linguistics in propaganda wars: tasks of creating and countering propaganda
7
Technologies for Text and Speech processing (propaganda sites) Statistical / Hybrid MT Open-source‘Moses’ decoder http://www.statmt.org http://www.statmt.org Euronews site dump ~ 2009-2015 http://www.euronews.com (ar, de, en, fr, gr, hu, it, pe, pt, ru, tr, uk) http://www.euronews.com Plain text extraction & tokenization; Hunalign sentence alignment http://mokk.bme.hu/en/resources/hunal ign/ http://mokk.bme.hu/en/resources/hunal ign/ Part-of-speech tagging (for factored models: lemma/PoS/word) TnT http://www.coli.uni- saarland.de/~thorsten/tnt/ + parameter files http://www.coli.uni- saarland.de/~thorsten/tnt/ Leeds MT system (file translation): ar- en, fr-en, es-en, de-en, ru-en, uk-en http://corpus.leeds.ac.uk/lingenio/index file.html http://corpus.leeds.ac.uk/lingenio/index file.html Statistical decoder Phrase Table (Translation Model) ST TT Parallel texts (translat ions) Target texts Parallel texts (translat ions) training Target Language model training Target texts Linguistic features & analysis
8
Technologies for Text and Speech processing (propaganda sites) Information Extraction (IE) Identification of relevant information, NOT full text understanding Scenario template filling task = structured database of events from text GATE ANNIE: NER + Co-reference Scenario Template Filling Ontology PoS Tagging + chunking + Named Entity recognition + co-reference resolution System used: GATE (University of Sheffield) http://www.gate.ac.uk/http://www.gate.ac.uk/ Traditionally: for direct threat detection
9
Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response
10
Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling ru-en MT
11
Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}
12
Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}
13
Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}
14
Sensitivity to MT quality for Organization Names, scenario template filling Precision OK; Recall goes Solution: adapting MT to IE? Challenge: Information Extraction from MT output
15
Future work Invitation for collaboration (http://corpus.leeds.ac.uk/defence/)http://corpus.leeds.ac.uk/defence/ Community response to propaganda threats Beyond security analysts: anti-terrorist volunteers and crowd intelligence Automatic creation of IE propaganda templates Template similarity and event similarity detection; argumentative texts Learning defense and security ontologies from corpora Automated reasoning using ontologies (predicate & description logic) Modeling language distortion for real-world communication Dialectal, graphical variation, misspelling, abbreviations MT and IE for non-literal language usage metaphors, euphemisms, indirect references
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.