Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual.

Similar presentations


Presentation on theme: "School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual."— Presentation transcript:

1 School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual Information Extraction framework for real-time detection of terrorist propaganda threats in on-line communication Bogdan Babych Centre for Translation Studies b.babych@leeds.ac.uk b.babych@leeds.ac.uk XI International Conference “Military education and science: the present and the future” Military Institute of Taras Shevchenko National University, Kyiv, Ukraine, 27 November 2015 Eric Atwell Artificial Intelligence Research Group e.s.atwell@leeds.ac.uk e.s.atwell@leeds.ac.uk

2 Overview NLP for detection of direct terrorist threats is not enough Propaganda treats: radicalization, recruitment, justification State propaganda as an extension of ‘soft power’ used as a military instrument EU Horizon2020 proposal: automated real-time multilingual detection of security & terrorist propaganda threats Technologies: Machine Translation (MT) + Information Extraction (IE) Innovative challenges: IE template filling task for propaganda messages Exploitation: community intelligence and response development Future work: technological outlook & invitation for collaboration

3 Natural Language Processing (NLP) for direct threat detection is not enough NLP techniques for Traditionally: identification of direct terrorist threats Focus on illegal activities (planned attacks) Discovering actionable information preventing an attack uncovering a network Alerts for analysts about suspect communication Database of connected facts Intelligent decision-support systems US DARPA DEFT project: https://gigaom.com/2014/05/02/darpa-is-working-on-its- own-deep-learning-project-for-natural-language- processing/ https://gigaom.com/2014/05/02/darpa-is-working-on-its- own-deep-learning-project-for-natural-language- processing/ UK IDEAS Factory - Detecting Terrorist Activities: Making Sense (included Leeds team), EPSRC/ESRC/CPNI http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=E P/H023135/1 http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=E P/H023135/1

4 Natural Language Processing (NLP) for direct threat detection is not enough Problem: propaganda not captured by traditional direct threat detection Terrorist propaganda, fundamentalist radicalization not strictly illegal Increasingly used by terrorist groups & states-sponsors of terrorism for: [Radicalization]  [Recruiting fighters] Creation of local cells, ‘5 th column’ Ideological justification of causes for terrorism, manipulation of public opinion Crowdsourcing political influence: ‘soft power’ turned ‘hard’ military instrument State propaganda targets international public opinion and political decisions Has direct military consequences

5 Computational Linguistics in propaganda wars: tasks of creating and countering propaganda In Russia – at least since 2004: evidence of funding research on linguistic means for manipulating public opinion Models based on Melchuk’s ‘Meaning  Text Theory’

6 Technologies rely on combination of: Machine Translation (MT): Statistical+Rule-Based=Hybrid Linguistic features for Part-of-Speech Tagging + Lemmatization Parsing (string-to-tree MT) Information Extraction (IE) from MT-translated texts (en) Named Entity recognition (Person, Organization, Location… names) Scenario template filling (Detection of Events, Relations, Participants) Text similarity detection: e.g., lexical overlap (L) + structure (S) + keywords (K) + named entities (N) (Su and Babych, 2012) Computational Linguistics in propaganda wars: tasks of creating and countering propaganda

7 Technologies for Text and Speech processing (propaganda sites) Statistical / Hybrid MT Open-source‘Moses’ decoder http://www.statmt.org http://www.statmt.org Euronews site dump ~ 2009-2015 http://www.euronews.com (ar, de, en, fr, gr, hu, it, pe, pt, ru, tr, uk) http://www.euronews.com Plain text extraction & tokenization; Hunalign sentence alignment http://mokk.bme.hu/en/resources/hunal ign/ http://mokk.bme.hu/en/resources/hunal ign/ Part-of-speech tagging (for factored models: lemma/PoS/word) TnT http://www.coli.uni- saarland.de/~thorsten/tnt/ + parameter files http://www.coli.uni- saarland.de/~thorsten/tnt/ Leeds MT system (file translation): ar- en, fr-en, es-en, de-en, ru-en, uk-en http://corpus.leeds.ac.uk/lingenio/index file.html http://corpus.leeds.ac.uk/lingenio/index file.html Statistical decoder Phrase Table (Translation Model) ST TT Parallel texts (translat ions) Target texts Parallel texts (translat ions) training Target Language model training Target texts Linguistic features & analysis

8 Technologies for Text and Speech processing (propaganda sites) Information Extraction (IE) Identification of relevant information, NOT full text understanding Scenario template filling task = structured database of events from text GATE ANNIE: NER + Co-reference Scenario Template Filling Ontology PoS Tagging + chunking + Named Entity recognition + co-reference resolution System used: GATE (University of Sheffield) http://www.gate.ac.uk/http://www.gate.ac.uk/ Traditionally: for direct threat detection

9 Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response

10 Templates for identification of factual inconsistencies in texts Alerts about propaganda threats Tracking source (multilingual) Resources (facts) for real-time development of a response Challenge: IE templates for detecting state- and terrorist propaganda messages Scenario template filling ru-en MT

11 Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

12 Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

13 Challenge: IE templates for detecting state- and terrorist propaganda messages More complex templates: attitude frameworks Consistent response needs an alternative framework How to identify resource for a response: European values system {?}

14 Sensitivity to MT quality for Organization Names, scenario template filling Precision OK; Recall goes Solution: adapting MT to IE? Challenge: Information Extraction from MT output

15 Future work Invitation for collaboration (http://corpus.leeds.ac.uk/defence/)http://corpus.leeds.ac.uk/defence/ Community response to propaganda threats Beyond security analysts: anti-terrorist volunteers and crowd intelligence Automatic creation of IE propaganda templates Template similarity and event similarity detection; argumentative texts Learning defense and security ontologies from corpora Automated reasoning using ontologies (predicate & description logic) Modeling language distortion for real-world communication Dialectal, graphical variation, misspelling, abbreviations MT and IE for non-literal language usage metaphors, euphemisms, indirect references


Download ppt "School of something FACULTY OF OTHER School of Languages, Cultures and Societies – Faculty of Arts School of Computing – Faculty of Engineering Multilingual."

Similar presentations


Ads by Google