1 Foundations of Software Design Lecture 26: Text Processing, Tries, and Dynamic Programming Marti Hearst & Fredrik Wallenberg Fall 2002.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Analysis of Algorithms
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
COMP8620 Lecture 8 Dynamic Programming.
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
Review: Dynamic Programming
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Tries Search for ‘bell’ O(n) by KMP algorithm O(dm) in a trie Tries
Goodrich, Tamassia String Processing1 Pattern Matching.
Lecture 7: Greedy Algorithms II Shang-Hua Teng. Greedy algorithms A greedy algorithm always makes the choice that looks best at the moment –My everyday.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
CSC 213 Lecture 18: Tries. Announcements Quiz results are getting better Still not very good, however Average score on last quiz was 5.5 Every student.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
Dynamic Programming 0-1 Knapsack These notes are taken from the notes by Dr. Steve Goddard at
Lecture 7: Greedy Algorithms II
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 7 Topics Dynamic Programming
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
David Luebke 1 8/23/2015 CS 332: Algorithms Greedy Algorithms.
IT 60101: Lecture #201 Foundation of Computing Systems Lecture 20 Classic Optimization Problems.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
CSC401 – Analysis of Algorithms Chapter 9 Text Processing
Greedy Algorithms Dr. Yingwu Zhu. Greedy Technique Constructs a solution to an optimization problem piece by piece through a sequence of choices that.
David Luebke 1 10/24/2015 CS 332: Algorithms Greedy Algorithms Continued.
CSC 413/513: Intro to Algorithms Greedy Algorithms.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
CSC 201: Design and Analysis of Algorithms Greedy Algorithms.
Introduction to Algorithms Jiafen Liu Sept
Dynamic Programming.  Decomposes a problem into a series of sub- problems  Builds up correct solutions to larger and larger sub- problems  Examples.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Greedy Algorithms BIL741: Advanced Analysis of Algorithms I (İleri Algoritma Çözümleme I)1.
1 i206: Lecture 16: Data Structures for Disk; Advanced Trees Marti Hearst Spring 2012.
2/19/ ITCS 6114 Dynamic programming 0-1 Knapsack problem.
Greedy Algorithms Analysis of Algorithms.
CSC 213 Lecture 19: Dynamic Programming and LCS. Subsequences (§ ) A subsequence of a string x 0 x 1 x 2 …x n-1 is a string of the form x i 1 x.
CS 361 – Chapter 10 “Greedy algorithms” It’s a strategy of solving some problems –Need to make a series of choices –Each choice is made to maximize current.
Greedy Algorithms. Zhengjin,Central South University2 Review: Dynamic Programming Summary of the basic idea: Optimal substructure: optimal solution to.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
CS6045: Advanced Algorithms Greedy Algorithms. Main Concept –Divide the problem into multiple steps (sub-problems) –For each step take the best choice.
CS583 Lecture 12 Jana Kosecka Dynamic Programming Longest Common Subsequence Matrix Chain Multiplication Greedy Algorithms Many slides here are based on.
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
15-853:Algorithms in the Real World
CSC317 Greedy algorithms; Two main properties:
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
The Greedy Method and Text Compression
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
13 Text Processing Hongfei Yan June 1, 2016.
Algorithm Design Methods
CS6045: Advanced Algorithms
CS Algorithms Dynamic programming 0-1 Knapsack problem 12/5/2018.
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Longest Common Subsequence
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Tries 2/27/2019 5:37 PM Tries Tries.
Dynamic Programming-- Longest Common Subsequence
Longest Common Subsequence
Dynamic Programming II DP over Intervals
Sequences 5/17/ :43 AM Pattern Matching.
Longest Common Subsequence
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Presentation transcript:

1 Foundations of Software Design Lecture 26: Text Processing, Tries, and Dynamic Programming Marti Hearst & Fredrik Wallenberg Fall 2002

2 Problem: String Search Determine if, and where, a substring occurs within a string

3 Approaches/Algorithms: Brute Force Rabin-Karp Tries Dynamic Programming

4 “Brute Force” Algorithm

5 Worst-case Complexity

6 Best-case Complexity, String Found

7 Best-case Complexity, String Not Found

8 Rabin-Karp Algorithm Calculate a hash value for –The pattern being searched for (length M), and –Each M-character subsequence in the text Start with the first M-character sequence –Hash it –Compare the hashed search term against it –If they match, then look at the letters directly Why do we need this step? –Else go to the next M-character sequence (Note 1: Karp is a Turing-award winning prof. in CS here!) (Note 2: CS theory is a good field to be in because they name things after you!)

9 Karp-Rabin: Looking for mod 13 = 7 Thus compute each 5-char substring mod 13 looking for Found 7! Now check the digits

10 Rabin-Karp Algorithm Worst case time? –N is length of the string –O(N) if the hash function is chosen well /animations.htmlhttp:// /animations.html (Note 1: Karp is a Turing-award winning prof. in CS here!) (Note 2: CS theory is a good field to be in because they name things after you!)

11 Tries A tree-based data structure for storing strings in order to make pattern matching faster Main idea: –Store all the strings from the document, one letter at a time, in a tree structure –Two strings with the same prefix are in the same subtree Useful for IR prefix queries –Search for the longest prefix of a query string Q that matches a prefix of some string in the trie –The name comes from Information Retrieval

12 Trie Example The standard trie over the alphabet {a,b} for the set {aabab, abaab, babbb, bbaaa, bbbab}

13 A Simple Incremental Algorithm To build the trie, simple add one string at a time Check to see if the current character matches the current node. If so, move to the next character If not, make a new branch labeled with the mismatched character, and then move to the next character Repeat

14 Trie-growing Algorithm a r i d e l l c k t o p h e a r s e e u b y l l l l buy bell hear see bid bear stop bull sell stock

15 Tries, more formally The path from the root of T to any node represents a prefix that is equal to the concatenation of the characters encountered while traversing the path. –An internal node can have from 1 to d children where d is the size of the alphabet. The previous example is a binary tree because the alphabet had only 2 letters –A path from the root of T to an internal node i corresponds to an i-character prefix of a string S –The height of the tree is the length of the longest string –If there are S unique strings, T has S leaf nodes –Looking up a string of length M is O(M)

16 Compressed Tries Compression is done after the trie has been built up; can’t add more items.

17 Compressed Tries Also known as PATRICIA Trie –Practical Algorithm To Retrieve Information Coded In Alphanumeric –D. Morrison, Journal of the ACM 15 (1968). Improves a space inefficiency of Tries Tries to remove nodes with only one child (pardon the pun) The number of nodes is proportional to the number of strings, not to their total length –But this just makes the node labels longer –So this only helps if an auxiliary data structure is used to actually store the strings –The trie only stores triplets of numbers indicating where in the auxiliary data structure to look

18 Compressed Trie s hear$ar e b llidll u y to pckll e e

19 Suffix Tries Regular tries can only be used to find whole words. What if we want to search on suffixes? –build*, mini* Solution: use suffix tries where each possible suffix is stored in the trie Example: minimize nimize mi i zemize ze e nimize Find:imi i m i

20 Dynamic Programming Used primarily for optimization problems. –Not just a good solution, but an optimal one. Brute force algorithms –Try every possibility –Guarantee finding the optimal solution –But inefficient DP requires a certain amount of structure, namely: –Simple Subproblems (and simple break-down) –Global optimum is a composition of subproblem optimums –Subproblem Overlap: optimal solutions to unrelated problems can contain subproblems in common. –In other words, can re-use the results of solving the subproblem

21 Longest Common Subsequence LCS: find the longest string S that is a subsequence of both X and Y, where X is of length n Y is of length m Example: what is the LCS of supergalactic galaxy (The characters do not have to be contiguous)

22 * Longest Common Subsequence Dynamic Programming Applied to LCS Problem Let’s compare: X = [GTG] X[0…i] Y = [CGATG] Y[0…j] We represent the longest subsequence as L[i,j]

23 Dynamic Programming for LCS Note that the longest string of X and Y (L[i,j]) must be equal to the longest string of... X[0…i-1] = [GT] (removing the last G) Y[0…j-1] = [CGAT] (removing the last G) … plus 1, since the matching Gs at X i,Y j will increase the length by one.

24 Dynamic Programming for LCS If X i,Y j had NOT matched, L[i,j] would have to be equal to the longest string in L[i-1,j] or L[i,j-1]. If this is true for L[i,j], it must be true for all L. We know that L[-1,-1] = 0 (since both strings are empty) Finally we know that L[i,j] cannot be larger than max(i,j)+1

25 Dynamic Programming for LCS L[0,1] = 1 (X 0,Y 1 does match… L[-1,0] + 1) 1 L[0,0] = 0 (X 0,Y 0 doesn’t match… max of L[-1,0] and L[0,-1]) For each position, take the max of L[i-1,j] or L[i,j-1] Add 1 when a new match is found.

26 Dynamic Programming Running Time/Space: –Strings of length m and n –O(mn) –Brute force algorithm: 2 m subsequences of x to check against n elements of y: O(n 2 m )

27 Dynamic Programming vs. Greedy Algorithms Sometimes they are the same. Sometimes not What makes an algorithm greedy? –Globally optimal solution can be obtained by making locally optimal choices Dynamic Programming –Solves subproblems, that can be re-used –Trickier to think of –More work to program

28 From edu/~bodik/cs536.html Greedy Vs. Dynamic Programming: The famous knapsack problem: –A thief breaks into a museum. Fabulous paintings, sculptures, and jewels are everywhere. The thief has a good eye for the value of these objects, and knows that each will fetch hundreds or thousands of dollars on the clandestine art collector’s market. But, the thief has only brought a single knapsack to the scene of the robbery, and can take away only what he can carry. What items should the thief take to maximize the haul?

29 From edu/~bodik/cs536.html The Knapsack Problem More formally, the 0-1 knapsack problem: –The thief must choose among n items, where the ith item worth v i dollars and weighs w i pounds –Carrying at most W pounds, want to maximize value Note: assume v i, w i, and W are all integers “0-1” b/c each item must be taken or left in entirety A variation, the fractional knapsack problem: –Thief can take fractions of items –Think of items in 0-1 problem as gold ingots, in fractional problem as buckets of gold dust

30 From edu/~bodik/cs536.html The Knapsack Problem: Optimal Substructure Both variations exhibit optimal substructure To show this for the 0-1 problem, consider the most valuable load weighing at most W pounds –If we remove item j from the load, what do we know about the remaining load? –The remainder must be the most valuable load weighing at most W - w j that the thief could take from museum, excluding item j

31 From edu/~bodik/cs536.html Solving The Knapsack Problem The optimal solution to the fractional knapsack problem can be found with a greedy algorithm The optimal solution to the 0-1 problem cannot be found with the same greedy strategy –Greedy strategy: take in order of dollars/pound –Example: 3 items weighing 10, 20, and 30 pounds, knapsack can hold 50 pounds Suppose item 2 is worth $100. Assign values to the other items so that the greedy strategy will fail

32 From edu/~bodik/cs536.html The Knapsack Problem: Greedy Vs. Dynamic The fractional problem can be solved greedily The 0-1 problem cannot be solved with a greedy approach –It can, however, be solved with dynamic programming