CrimeLink Explorer: Lt. Jennifer Schroeder Tucson Police Department Jie Xu University of Arizona June 2, 2003 Using Domain Knowledge to Facilitate Automated Crime Link Analysis
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
Link Analysis in Law Enforcement Extremely valuable, but extremely time consuming for investigators (sometimes months are spent constructing a large network) Can uncover valuable investigative leads Usually only conducted in high profile cases that justify the resource expenditure Can be very complex (high branching factors, especially among repeat offenders)
Data Sources for Link Analysis Police Incident reports (Police RMS) –Often largest source of data for analysis –Link based on co-occurrence in an incident –Analysts must examine each report to determine the strength of the link –Must be searched across multiple jurisdictions Field interviews Phone records Financial information Intelligence information (sometimes stored in databases) Interviews with witnesses, suspects, confidential informants
An example Eddie “Smith” is in 18 incident reports These incidents contain a total of 152 entities that are potential branches: –31 People –11 Vehicles –57 Locations –2 Organizations –1 Property item –1 Weapon This complexity is at a depth of one! Imagine the task for crime analysts to search each of these possible branches to create a large, multi-level link chart
Obstacles to LA Automation Lack of Integration/Data Consolidation High branching factors cause information overload Investigators must manually analyze every link to determine relevance No domain specific way to automate analysis for relevance of links
Proposed Approach Use concept space to extract associations from incident records Focus on domain specific heuristic to provide accurate link assessment Use shortest-path algorithm to find best path between individuals of interest Incorporate the approach into a prototype system with visualization of resulting paths Conduct a user study to evaluate the system
Literature Review Link Analysis –Anacapa Charting –Free Text Association Searches (NLP) –Watson –COPLINK Detect Domain Knowledge Incorporation –Expert Systems –Bayesian Networks Shortest-Path Algorithms
Domain Knowledge Incorporation Expert Systems Bayesian Networks Law Enforcement Specific Research
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
System Design Concept Space Incident Reports Heuristics (crime types, shared address, shared phone) Association Path Search (shortest-path algorithm) Graphical User Interface Heuristic WeightsCo-occurrence Weights
Experimental Database Dataset must contain real data so that crime investigators will be engaged and interested in the results The dataset must contain sufficient amounts of data for association paths between a reasonable number of subjects to exist Approximately 20 months of incident reports were extracted Age, gender, race, addresses, and phone numbers of persons involved in the incidents was also extracted Simple data consolidation on name for prototype
Heuristic Design Goals Provide weighting scheme for links that more accurately reflects judgment of human analysts Weights should be understandable to law enforcement users Improved weights should be used for shortest-path calculations
Heuristic Design Incorporated most important information considered by human analysts: –Relationship between crime type and person roles –Shared addresses or telephone numbers –Repeated co-occurrence in incident reports Employed a scale, familiar to users (used in RMS queries) Logarithmic transformation of link weight used to compute shortest path during searches
Crime Type and Person Role We constructed a matrix and assigned scores to role combinations in each of the crime types To construct the crime type/role matrix we interviewed sergeants from Homicide, Aggravated Assault, Robbery, Fraud, Auto Theft, Sexual Assault, Child Sexual Abuse, Domestic Violence Crime type/role combinations were assigned weights based on estimation by experts of likelihood of association for that combination Person roles used in the TPD dataset include: Victim, Witness, Suspect, Arrestee, and Other.
Co-occurrence Goal was to capture judgments of analysts when looking at repeated co-occurrences of entities Analyzed a random sample of 40 incident reports counting the number of times each pair of persons co-occurred Read supporting narrative reports for each incident to determine whether an association was important
Co-occurrence probability distribution Co-occurrence count Association probability (%) 4 100
Heuristic Function Investigators may rely more on crime type/role and shared associations, but a high co-occurrence weight can outweigh a low association weight Value calculated based on summed crime-type/person-role relationship, shared address, shared phone values Second value based on association probability of co- occurrence counts Maximum (0.85 (crime-type/person-role score) (shared phone score) (shared address score)) (100 (association probability based on co-occurrence counts))
Association Path Search Used Dijkstra’s shortest-path algorithm (1959) to address the search complexity problem Conventional shortest-path algorithms could not be used directly to solve the problem of identifying the strongest association between a pair of persons (Xu & Chen 2000) A logarithmic transformation was made on association weights
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
User Study Questions Can the automated link analysis approaches proposed (concept space approach, heuristic approach, and the shortest-path algorithm) help address the information overload and search complexity problem? Can incorporated domain knowledge help identify associations between crime entities more accurately than the concept space approach? Will domain experts perceive the automated link analysis approaches to be useful for crime investigation?
Hypotheses H1: Subjects will achieve higher efficiency conducting an association path search with the prototype system than with the “single-level” link analysis tool H2: Association paths found using heuristics will be more accurate than paths found using simple co-occurrence weight H3: Subjects will perceive the heuristic approach to be more useful than the concept space approach for investigative work.
Efficiency and Accuracy H1 and H2 Efficiency = the time a subject spends completing a given task Accuracy = the average agreement scale a subject indicates on the weights of associations on a path Usefulness = the average agreement scale is > 4, indicating positive assessment of usefulness
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
User Study Tasks Task 1: Use COPLINK Detect to find the strongest association paths between those criminals. Task 2: Use the concept space approach provided by the prototype system to find the strongest association paths, evaluate each association on the path, and indicate scales of agreement on the association weights. Task 3: Given the same set of criminal names used in task 2, use the heuristic approach to do same
Two or more names were entered to search for association paths
Returns are displayed in a network
Weak links can be removed to focus investigation where more information is needed
Clicking on a link displays information about origin and strength of link
Concept Space and heuristic values were compared by the users to assess comparative accuracy
H1, H2, H3 Two-tailed t-tests H1 was supported (t = 11.47, p < 0.001) H2 was supported (t = 2.04, p < 0.001) H3 was supported (t = 2.35, p < 0.05)
Weighting Agreement Scale
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A
Conclusions The system evaluation focused on the approaches’ efficiency, accuracy, and usefulness The three characteristics are desirable features of a sophisticated link analysis system The experiment results demonstrated the potential of our approach to achieve these features using domain-specific heuristics
Future Work Apply a statistical analysis on NIBRS (National Incident-Based Reporting System) data for more accurate crime type/relationship weights Extend heuristics to include common vehicles and common organization associations Encode expert knowledge in Bayesian networks and incrementally learn new knowledge from crime data Interface improvements suggested by users Improve data consolidation rules
Agenda Review of Problem: Link Analysis In Law Enforcement Problem Literature Review System & Heuristic Design User Study Design Demo User Study Results & Conclusions Q & A