Forensics and CS Philip Chan. CSI: Crime Scene Investigation www.cbs.com/shows/csi/ www.cbs.com/shows/csi/ high tech forensics tools DNA profiling Use.

Slides:



Advertisements
Similar presentations
Lesson 8 Searching and Sorting Arrays 1CS 1 Lesson 8 -- John Cole.
Advertisements

Copyright © 2014, 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Eighth Edition by Tony Gaddis,
HST 952 Computing for Biomedical Scientists Lecture 10.
Chapter 9: Searching, Sorting, and Algorithm Analysis
Copyright © 2012 Pearson Education, Inc. Chapter 8: Searching and Sorting Arrays.
CS0007: Introduction to Computer Programming Array Algorithms.
HST 952 Computing for Biomedical Scientists Lecture 9.
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Algorithm An algorithm is a step-by-step set of operations to be performed. Real-life example: a recipe Computer science example: determining the mode.
Ver. 1.0 Session 5 Data Structures and Algorithms Objectives In this session, you will learn to: Sort data by using quick sort Sort data by using merge.
@ Zhigang Zhu, CSC212 Data Structure - Section FG Lecture 19 Searching Instructor: Zhigang Zhu Department of Computer Science City College.
Chapter 2: Fundamentals of the Analysis of Algorithm Efficiency
Chapter 11 Sorting and Searching. Topics Searching –Linear –Binary Sorting –Selection Sort –Bubble Sort.
© 2006 Pearson Addison-Wesley. All rights reserved10-1 Chapter 10 Algorithm Efficiency and Sorting CS102 Sections 51 and 52 Marc Smith and Jim Ten Eyck.
Sorting and Searching. Searching List of numbers (5, 9, 2, 6, 3, 4, 8) Find 3 and tell me where it was.
CS107 Introduction to Computer Science Lecture 5, 6 An Introduction to Algorithms: List variables.
Efficiency of Algorithms February 11th. Efficiency of an algorithm worst case efficiency is the maximum number of steps that an algorithm can take for.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Chapter 11 Sorting and Searching.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
Analysis of Algorithm.
Searching and Sorting Arrays
Searching1 Searching The truth is out there.... searching2 Serial Search Brute force algorithm: examine each array item sequentially until either: –the.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Starting Out with C++: Early Objects 5/e © 2006 Pearson Education. All Rights Reserved Starting Out with C++: Early Objects 5 th Edition Chapter 9 Searching.
CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby (c) Ophir Frieder at al 2012.
Chapter 8 ARRAYS Continued
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8: Searching and Sorting Arrays.
Copyright © 2012 Pearson Education, Inc. Chapter 8: Searching and Sorting Arrays.
UNIT 18 Searching and Sorting.
Lecture 5 Searching and Sorting Richard Gesick. The focus Searching - examining the contents of the array to see if an element exists within the array.
Reynolds 2006 Complexity1 Complexity Analysis Algorithm: –A sequence of computations that operates on some set of inputs and produces a result in a finite.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 9: Algorithm Efficiency and Sorting Data Abstraction &
SEARCHING UNIT II. Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances.
Efficiency of Algorithms Csci 107 Lecture 7. Last time –Data cleanup algorithms and analysis –  (1),  (n),  (n 2 ) Today –Binary search and analysis.
Forensics and CS Philip Chan. CSI: Crime Scene Investigation high tech forensics tools DNA profiling Use.
© 2011 Pearson Addison-Wesley. All rights reserved 10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Chapter 8 Searching and Sorting Arrays Csc 125 Introduction to C++ Fall 2005.
Copyright © 2015, 2012, 2009 Pearson Education, Inc., Publishing as Addison-Wesley All rights reserved. Chapter 8: Searching and Sorting Arrays.
SEARCHING. Vocabulary List A collection of heterogeneous data (values can be different types) Dynamic in size Array A collection of homogenous data (values.
CSC 211 Data Structures Lecture 13
Starting Out with C++ Early Objects Seventh Edition by Tony Gaddis, Judy Walters, and Godfrey Muganda Modified for use by MSU Dept. of Computer Science.
Review 1 Arrays & Strings Array Array Elements Accessing array elements Declaring an array Initializing an array Two-dimensional Array Array of Structure.
Searching Algorithms Sequential Search – inspects every items in a sequential manner. Example, in an array, all values in the array are checked from index.
Searching & Sorting Programming 2. Searching Searching is the process of determining if a target item is present in a list of items, and locating it A.
Lecture on Binary Search and Sorting. Another Algorithm Example SEARCHING: a common problem in computer science involves storing and maintaining large.
Bubble Sort.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Sorting.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 th Ed Chapter Searching When we maintain a collection of data,
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Visual C++ Programming: Concepts and Projects Chapter 8A: Binary Search (Concepts)
C++ How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Searching Topics Sequential Search Binary Search.
 2006 Pearson Education, Inc. All rights reserved. 1 Searching and Sorting.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with C++ Early Objects Seventh Edition by Tony Gaddis, Judy.
 Introduction to Search Algorithms  Linear Search  Binary Search 9-2.
Chapter 16: Searching, Sorting, and the vector Type.
CS1010 Programming Methodology
Forensics and CS Philip Chan.
Search Recall the employee database used in class examples: employeeRecordT workforce[5] = Given "Vance, Emelline", how can we find her salary? Name.
CS1010 Programming Methodology
Searching CSCE 121 J. Michael Moore.
Searching and Sorting Arrays
Presentation transcript:

Forensics and CS Philip Chan

CSI: Crime Scene Investigation high tech forensics tools DNA profiling Use as evidence in court cases

DNA Deoxyribonucleic Acid Each person is unique in DNA (except for twins) DNA samples can be collected at crime scenes About.1% of human DNA varies from person to person

Forensics Analysis Focus on loci (locations) of the DNA Values at the those loci (DNA profile) are recorded for comparing DNA samples. Two DNA profiles from the same person have matching values at all loci. More or fewer loci are more accurate in identification? Tradeoffs? FBI uses 13 core loci

We do not want to wrongly accuse someone How can we find out how likely another person has the same DNA profile? How many people are in the world? How low the probability needs to be so that a DNA profile is unique in the world? Low probability doesn’t mean impossible Just very unlikely

Review of basic probability Joint probability of two independent events P(A,B) = ?

Review of basic probability Joint probability of two independent events P(A,B) = P(A) * P(B) Independent events mean knowing one event does not provide information about the other events P(Die1=1, Die2=1) = P(Die1=1) * P(Die2=1) = 1/6 * 1/6 = 1/36.

Enumerating the events ,11,2… events, each is equally likely, so 1/36

Joint probability P(Die1=even, Die2=6) = ?

Joint probability P(Die1=even, Die2=6) = 1/2 * 1/6 = 1/12 P(Die1=1, Die2=5, Die3=4) = ?

Joint probability P(Die1=even, Die2=6) = 1/2 * 1/6 = 1/12 P(Die1=1, Die2=5, Die3=4) = (1/6) 3 = 1/216

DNA profile probability How to estimate?

DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *...

DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *... How to estimate P(Locus1=value1)?

DNA profile probability How to estimate? Assuming loci are independent P(Locus1=value1, Locus2=value2,...) = P(Locus1=value1) * P(Locus2=value2) *... How to estimate P(Locus1=value1)? a random sample of size N from the population and find out how many people out of N have value1 at Locus1

Database of DNA profiles IdLocus1Locus2Locus3…Locus13 A5212 A6921 …

Problem Formulation Given A sample profile (e.g. collected from the crime scene) A database of known profiles Find The probability of the sample profile if it matches a known profile in the database

Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems?

Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems? Subproblem 1 Find whether the sample profile matches  1a: ?  1b: ? Subproblem 2 Calculate the probability of the profile

Breaking Down the Problem Find The probability of the sample profile if it matches a known profile in the database What are the subproblems? Subproblem 1 Find whether the sample profile matches  1a: check entries in the database  1b: check if all 13 loci match in each entry Subproblem 2 Calculate the probability of the profile

Simpler Problem for 1a (very common) Given an array of integers (e.g. student IDs) an integer (e.g. an ID) Find whether the integer is in the array int[] directory; // student id’s int id; // to be found boolean found; // true if id is in directory

Linear/Sequential Search Check one by one Stop if you find it Stop if you run out of items to check Not found

Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks?

Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks? First item;1 check Worst-case scenario When does it occur? How many checks?

Number of Checks (speed of algorithm) Consider N items in the array Best-case scenario When does it occur? How many checks? First item;1 check Worst-case scenario When does it occur? How many checks? Last item or not there; N checks Average-case scenario Average of all cases ( … + N) / N =

Can we do better? Faster algorithm? What if the array is sorted, items are in an order E.g. a phone book

Binary Search Check the item at midpoint If found, done Otherwise, eliminate half and repeat

Breaking down the problem While more items and not found What are the two subproblems?

Breaking down the problem While more items and not found Eliminate half of the items Find the mid point

Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks?

Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check

Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check Worst-case scenario When does it occur? How many checks?

Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check Worst-case scenario When does it occur? How many checks? Dividing into two halves, half has only one item ? checks

Number of checks (Speed of algorithm) Best-case scenario When does it occur? How many checks? In the middle; 1 check Worst-case scenario When does it occur? How many checks? Dividing into two halves, half has only one item ? checks

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … any pattern?

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k N/2 k gets smaller and eventually becomes 1

Number of checks (Speed of algorithm) T(1) = 1 T(N) = T(N/2) + 1 = [ T(N/4) + 1 ] + 1 = [ [ T(N/8) + 1] + 1] + 1 = … = T(N/2 k ) + k N/2 k gets smaller and eventually becomes 1 solve for k

Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = ?

Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N

Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N T(N) = T(N/2 k ) + k = T(1) + log 2 N = ? + log 2 N

Number of Checks (Speed of Algorithm) N/2 k = 1 N = 2 k k = log 2 N T(N) = T(N/2 k ) + k = T(1) + log 2 N = 1 + log 2 N

Sorting (arranging the items in a desired order) How is the phone book arranged? Why? Why not arranged by numbers?

Sorting (arranging the items in a desired order) How is the phone book arranged? Why? Why not arranged by numbers? Order Alphabetical Low to high numbers DNA profile with 13 loci?

Sorting Imagine you have a thousand numbers in an array How would you systemically sort them?

Selection Sort (ascending) Find/select the smallest item Swap the smallest item with the first item

Selection Sort (ascending) Find the smallest item Swap the smallest item with the first item Find/select the second smallest item Swap the second smallest item with the second item …

Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index What are the two subproblems?

Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index Find the smallest item

Breaking down the problem Get all the items in ascending order Get one item at the wanted position/index Find the smallest item Swap the smallest item with the item at the wanted position

Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items

Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons?

Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons? Worst-case scenario (most # of comparisons) When does it occur? How many comparisons?

Number of comparisons (Speed of Algorithm) Consider counting Number of comparisons between array items Best-case scenario (least # of comparisons) When does it occur? How many comparisons? Worst-case scenario (most # of comparisons) When does it occur? How many comparisons? Same number of comparisons For all cases (ie best case = worst case)

Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons?

Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons?

Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons?

Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons (N-1) + (N-2) + … + 1

Number of comparisons (Speed of Algorithm) To find the smallest item How many comparisons? N-1 To find the second smallest item How many comparisons? N-2 … Total # of comparisons (N-1) + (N-2) + … + 1 N(N-1)/2 = (N 2 – N)/2

Summary DNA samples from crime scene Identify people using known DNA profiles If there is a match estimate probability of DNA profile Matching a sample to known DNA profiles Linear/sequential search [N checks] Binary search [log 2 N + 1 checks] Faster but needs sorted data/profiles  Selection Sort [(N 2 – N)/2 comparisons]