Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System.

Similar presentations


Presentation on theme: "Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System."— Presentation transcript:

1 Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System

2 Outline > Project Overview > SOCIS Architecture > Corpus > Linguistic Analysis > Pointers

3 Project Overview > Domain: Scene of Crime Investigation (SOC) > Main Features : 1. Multimedia briefing  Summarisation of text and images 2. Generation  Of formal reports & of photo albums 3. Intelligent Search 2000 - 2003

4 Project Overview (2) > Other systems for Crime Investigation:  Academic R&D Projects  Governmental agencies’ Systems  Commercial Systems BUT: SOCIS brings ‘intelligence’ to CI systems > The ‘Digital Evidence in Court’ issue:  Authenticity has to be verified  Recently accepted in court

5 A view of SOCIS + Image processing Text processing Integrated Knowledge Base

6 Text Processing - Text Corpus - Information Extraction system >> Named Entities Recognition >> Co-reference Resolution Need: Linguistic Analysis of the Language at the SOC Lexical Information Morphosyntactic Information Semantic Information

7 The Corpus 4 days spent with a SOCO: 12 scenes visited * 2 complete case files examined * official documentation collected  Official documentation : SOC Reports = 77 Photo Indexes = 300 Witness Statements = 14  Reported SOC Information : Press Association = 792 Washington Post = 233 Crime Watch = 8 NEEDEDNEEDED Reports - Photo indexes Witness statements Photographs ! For the same case ! For major crime ! Of significant quantity

8 Examples

9 SOC Language Characteristics General Characteristics: ! Telegraphic ! Descriptive ! Accurate ! Objective Special text type : Reports

10 Lexical Information Characteristics: - Extensive use of abbreviations - Jargon Creation of Word - Lists (gazetteers): - Based on PITO’s CDM - Over 200 lists (domain + general) Words of interest are assigned a semantic category

11 Morphosyntactic Features ! Extensive Ellipsis ! Simple temporal dimensions ! Limited co-ordination ! Sub-ordination avoided ! POS : NPs, PPs ! Adjuncts of place - time, Qualifiers For identifying entities of interest automatically, we need to write specific rules using: ! The word lists + Context Information

12 Modelling (1)

13 Modelling (2)

14 Pointers > SOCIS Sheffield Web Page http://www.dcs.shef.ac.uk/nlp/socis  > SOCIS Surrey Web Page http://www.computing.surrey.ac.uk/ai/socis  > NLP Group http://www.dcs.shef.ac.uk/nlp


Download ppt "Analysing Crime-Scene Reports Katerina Pastra and Horacio Saggion University of Sheffield Scene of Crime Information System."

Similar presentations


Ads by Google