Download presentation
Presentation is loading. Please wait.
Published byEugenia Summers Modified over 9 years ago
1
Forensics and CS Philip Chan
2
CSI: Crime Scene Investigation www.cbs.com/shows/csi/ www.cbs.com/shows/csi/ high tech forensics tools DNA profiling Use as evidence in court cases
3
DNA Deoxyribonucleic Acid Each person is unique in DNA (except for twins) DNA samples can be collected at crime scenes About.1% of human DNA varies from person to person
4
Forensics Analysis Focus on loci (locations) of the DNA Values at the those loci (DNA profile) are recorded for comparing DNA samples.
5
Forensics Analysis Focus on loci (locations) of the DNA Values at the those loci (DNA profile) are recorded for comparing DNA samples. Two DNA profiles from the same person have matching values at all loci.
6
Forensics Analysis Focus on loci (locations) of the DNA Values at the those loci (DNA profile) are recorded for comparing DNA samples. Two DNA profiles from the same person have matching values at all loci. More or fewer loci are more accurate in identification? Tradeoffs?
7
Forensics Analysis Focus on loci (locations) of the DNA Values at the those loci (DNA profile) are recorded for comparing DNA samples. Two DNA profiles from the same person have matching values at all loci. More or fewer loci are more accurate in identification? Tradeoffs? FBI uses 13 core loci http://www.cstl.nist.gov/biotech/strbase/fbicore.htm
8
We do not want to wrongly accuse someone How can we find out how likely another person has the same DNA profile?
9
We do not want to wrongly accuse someone How can we find out how likely another person has the same DNA profile? How many people are in the world?
10
We do not want to wrongly accuse someone How can we find out how likely another person has the same DNA profile? How many people are in the world? How low the probability needs to be so that a DNA profile is unique in the world?
11
We do not want to wrongly accuse someone How can we find out how likely another person has the same DNA profile? How many people are in the world? How low the probability needs to be so that a DNA profile is unique in the world? Low probability doesn’t mean impossible Just very unlikely
12
Review of basic probability Joint probability of two independent events P(A,B) = ?
13
Review of basic probability Joint probability of two independent events P(A,B) = P(A) * P(B) Independent events mean knowing one event does not provide information about the other events P(Die1=1, Die2=1) = P(Die1=1) * P(Die2=1) = 1/6 * 1/6 = 1/36.
14
Enumerating the events 123456 11,11,2… 2 3 4 5 6 36 events, each is equally likely, so 1/36
15
Joint probability P(Die1=even, Die2=6) = ?
16
Joint probability P(Die1=even, Die2=6) = 1/2 * 1/6 = 1/12 P(Die1=1, Die2=5, Die3=4) = ?
17
Joint probability P(Die1=even, Die2=6) = 1/2 * 1/6 = 1/12 P(Die1=1, Die2=5, Die3=4) = (1/6) 3 = 1/216
18
DNA profile probability How to estimate?
19
DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *...
20
DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *... How to estimate P(Locus1=value1)?
21
DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *... How to estimate P(Locus1=value1)? a random sample of size N from the population and find out how many people out of N have value1 at Locus1
22
Database of DNA profiles IdLocus1Locus2Locus3…Locus13 A5212 A6921 …
23
Problem Formulation Given A sample profile (e.g. collected from the crime scene) A database of known profiles Find The probability of the sample profile if it matches a known profile in the database
24
Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems?
25
Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems? Subproblem 1 Find whether the sample profile matches 1a: ? 1b: ? Subproblem 2 Calculate the probability of the profile
26
Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems? Subproblem 1 Find whether the sample profile matches 1a: check entries in the database 1b: check loci in each entry Subproblem 2 Calculate the probability of the profile
27
Simpler Problem for 1a (very common) Given an array of integers (e.g. student IDs) an integer (e.g. an ID) Find whether the integer is in the array (e.g. whether you can enter your dorm) int[] directory; // student id’s int id; // to be found
28
Linear Search
29
Linear/Sequential Search Check one by one Stop if you find it Stop if you run out of items to check Not found
30
Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks?
31
Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks? First item;1 check Worst-case scenario When does it occur? How many checks?
32
Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks? First item;1 check Worst-case scenario When does it occur? How many checks? Last item or not there; N checks Average-case scenario Average of all cases (1 + 2 + … + N) / N = [N(N+1)/2] / N = (N+1)/2
33
Matching DNA profiles Each profile has 13 loci Do we always need to check all 13 loci to decide if a match occurs or not?
34
Can we do better? Faster algorithm? What if the array is sorted, items are in an order E.g. a phone book
35
Binary Search
36
1. Check the item at midpoint 2. If found, done 3. Otherwise, eliminate half and repeat 1 and 2
37
Breaking down the problem While more items and not found in the mid point What are the two subproblems?
38
Breaking down the problem While more items and not found in the mid point Eliminate half of the items Determine the mid point
39
Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks?
40
Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check
41
Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check Worst-case scenario When does it occur? How many checks?
42
Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check Worst-case scenario When does it occur? How many checks? Dividing into two halves, half has only one item ? checks
43
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1
44
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1
45
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1
46
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1
47
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1
48
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … any pattern?
49
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k
50
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k N/2 k gets smaller and eventually becomes 1
51
Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k N/2 k gets smaller and eventually becomes 1 solve for k
52
Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = ?
53
Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N
54
Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N T(N) = T(N/2 k ) + k = T(1) + log 2 N = ? + log 2 N
55
Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N T(N) = T(N/2 k ) + k = T(1) + log 2 N = 1 + log 2 N
56
N (Linear search) vs log N + 1 (Binary search) N 1007.6 1,00011.0 10,00014.3 100,00017.6 1,000,00020.9 10,000,00024.3 100,000,00027.6
57
Before using Binary Search The array needs to be sorted (in order)
58
Sorting
59
Sorting (arranging the items in a desired order) How is the phone book arranged? Why? Why not arranged by numbers?
60
Sorting (arranging the items in a desired order) How is the phone book arranged? Why? Why not arranged by numbers? Order Alphabetical Low to high numbers DNA profile with 13 loci?
61
Sorting Imagine you have a thousand numbers in an array How would you systemically sort them?
62
Selection Sort (ascending) Find/select the smallest item Swap the smallest item with the first item
63
Selection Sort (ascending) Find/select the smallest item Swap the smallest item with the first item Find/select the second smallest item Swap the second smallest item with the second item …
64
Example 67251
65
67251
66
67251 17256
67
67251 17256
68
67251 17256 12756
69
67251 17256 12756
70
67251 17256 12756 12576
71
67251 17256 12756 12576
72
67251 17256 12756 12576 12567
73
Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index What are the two subproblems?
74
Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index Find the smallest item
75
Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index 1. Find the smallest item 2. Swap the smallest item with the item at the wanted position
76
Algorithm Summary (Selection Sort) For each “desired” position Between the “desired” position and the end Find the smallest item Swap the smallest item with the item at the “desired” position
77
Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items
78
Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons?
79
Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons? Worst-case scenario (most # of comparisons) When does it occur? How many comparisons?
80
Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons? Worst-case scenario (most # of comparisons) When does it occur? How many comparisons? Same number of comparisons For all cases (ie best case = worst case)
81
Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons?
82
Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons?
83
Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons?
84
Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons (N-1) + (N-2) + … + 1
85
Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons (N-1) + (N-2) + … + 1 N(N-1)/2 = (N 2 – N)/2
86
Selection Sort Not the fastest sorting algorithm Learn faster algorithms in more advanced courses.
87
Revisiting Binary Search
88
Binary Search While more items and not found in the mid point 1. Eliminate half of the items 2. Determine the mid point
89
Eliminate half of the array How to specify the focus region? Hint: index/position
90
Eliminate half of the array How to specify the focus region? Hint: index/position Start and end
91
How to determine if the region has items (is not empty)? with start and end
92
How to determine if the region has items (is not empty)? with start and end Start <= end
93
How do we adjust start and end?
94
What are the two different cases?
95
How do we adjust start and end? What are the two different cases? Item is before the middle item Item is after the middle item
96
How do we adjust start and end? What are the two different cases? Item is before the middle item Start: End: Item is after the middle item
97
How do we adjust start and end? What are the two different cases? Item is before the middle item Start: no change End: position before the mid point Item is after the middle item
98
How do we adjust start and end? What are the two different cases? Item is before the middle item Start: no change End: position before the mid point Item is after the middle item Start: End:
99
How do we adjust start and end? What are the two different cases? Item is before the middle item Start: no change End: position before the mid point Item is after the middle item Start: position after the mid point End: no change
100
How to determine the mid point? with start and end?
101
How to determine the mid point? with start and end (start + end) / 2 Integer division will eliminate the fractional part
102
Algorithm Summary 1. Initialize start, end, and mid point (I) 2. While region has items and item is not at the mid point ( C ) a) Eliminate half of the items by adjusting start or end (U) b) Update the mid point (U) 3. If region has items Position is mid point else Position is -1
103
Overall Summary
104
DNA samples from crime scene Identify people using known DNA profiles If there is a match estimate probability of DNA profile Matching a sample to known DNA profiles Linear/sequential search [N checks] Binary search [log 2 N + 1 checks] Faster but needs sorted data/profiles Selection Sort [(N 2 – N)/2 comparisons]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.