Download presentation
Presentation is loading. Please wait.
Published byΣέργιος Δεσποτόπουλος Modified over 5 years ago
1
Adapting and Visualizing Association Rule Mining Systems
for Law Enforcement Purposes T.K. Cocx, 4/14/2019 W. Kosters
2
Research Area Criminal Career Study Computer Science Sociology
Goals Practice Structure Difficulties Algorithm Results Problems Computer Science Sociology Criminal Career Study Psychology Criminology Law 4/14/2019 T.K. Cocx,
3
Criminal Careers 4/14/2019 T.K. Cocx,
4
Analysis Goal Analysis 4/14/2019 T.K. Cocx,
5
Different angles to solution
Analyze criminal records predict career OR Understand crime better by (automatically) analyzing its characteristics. Both solutions work on same database National crime record database Done by de Bruin et al. (ICDM 2006) Focus of this talk 4/14/2019 T.K. Cocx,
6
Focus Database contains both crime AND demographic data
Relations between occurrences of certain crimes within individual careers Teaches crimes that predict others in a more general case Alarms Relations between crimes and demographic data within individual careers Discovers problematic ‘configurations’ within demographic areas Better deploy police work forces 4/14/2019 T.K. Cocx,
7
Approach Employ association rule mining:
Use standard Apriori methods ‘Common’ overpresent (statistically) Find common subsets of attributes A subset can only be common if all its subgroups are common as well Create tree containing all rules Visualize tree to police end user Is data suitable for this method? 4/14/2019 T.K. Cocx,
8
Problems Data is not boolean.
Nature of database (or crime itself) is responsible for a large number of over- or under-present attributes. Male (80%) Dutch (90%) Addiction indication (4%) Inherit relation between certain attributes pollutes outcome. Semi-aggregated attributes (first age / last age for one-timers) Descend / born 4/14/2019 T.K. Cocx,
9
Approach (Cont.) Fivethold method Database refit Attribute ban
Semantic Split Interestingness Tree visualization 4/14/2019 T.K. Cocx,
10
Database refit Standard methods rely on boolean databases
Attributes are present, or not. Numerical attributes are discretized Age 10 year intervals Nominal attributes are split into all available categories Resulting database is boolean Very large (all different etniticities) Very sparse (only one of them true) 4/14/2019 T.K. Cocx,
11
Attribute ban The database consists of many over-present attributes
Often lacking descriptive value (shopping bags) Example: deceased To cope with these, analysts can handpick disruptive attributes These are left out when searching 4/14/2019 T.K. Cocx,
12
Semantic Split The criminal record database has two parts that are clearly semantically different Demographic data (also known for non criminals) Crime data They are strictly separated number-wise. Analyst can handpick an x that states the beginning of the second halve. 1:N / N:1 Lower and upper halve Pick 1 item maximum from the lower halve. For example: born / ethniticity. 4/14/2019 T.K. Cocx,
13
Subset Search Support: number of occurrences (set to true) of an attribute. When support reaches threshold it is considered common. Not commonness one wants but interestingness Female Confidence: conditional probability of a certain itemset given another itemset. If a certain itemset that is ‘interesting’ implies another: the combination is also interesting 4/14/2019 T.K. Cocx,
14
Subset Search (Cont.) Confidence can be seen as:
max(C(a,b),C(b,a)) avg(C(a,b),C(b,a)) The latter is stronger because it demands an implication in both directions. An itemset will certainly be interesting if its occurence is much higher than one would expect based upon the occurence of its individual member-itemsets: Lift The relation between expected occurence and actual occurrence. 4/14/2019 T.K. Cocx,
15
Visualization All interesting item sets are put in a giant tree that is represented to the user In this tree each node is part of an interesting subset of the database with all its parents. NOT siblings 10 5 Interesting sets: 10 – 1 10 – 1 – 2 10 – 4 5 – 7 1 4 7 2 4/14/2019 T.K. Cocx,
16
Visualization (Cont.) It is common practice to represent the tree with all its member subsets Inpractical, especially for police analyst Use known paradigm Resulting dataset 4/14/2019 T.K. Cocx,
17
Some notable results Joyriding ↔ Violation of Work Circumstances ↔ Alcohol Addiction Drug Smuggling ↔ Drug Addiction Manslaughter ↔ Discrimination Male ↔ Theft with Violence ↔ Possession (of weapon) Female ↔ Drug Abuse African Descend↔ Public Safety Rural Areas ↔ Traffic Felonies 4/14/2019 T.K. Cocx,
18
Conclusion & Future work
The nature of the criminal record database needs a custom set of specific solutions Attribute ban, semantic split and visualization contribute largely to results from performed queries. Semantic bond between single attributes Search for most uncommon itemsets Inherently uncommon couples. (semantic bond) Comparison with social sciences 4/14/2019 T.K. Cocx,
19
Interrogation 4/14/2019 T.K. Cocx,
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.