1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 25 Oct.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
String Searching Algorithms Problem Description Given two strings P and T over the same alphabet , determine whether P occurs as a substring in T (or.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Yangjun Chen 1 String Matching String matching problem - prefix - suffix - automata - String-matching automata - prefix function - Knuth-Morris-Pratt algorithm.
Prefix & Suffix Example W = ab is a prefix of X = abefac where Y = efac. Example W = cdaa is a suffix of X = acbecdaa where Y = acbe A string W is a prefix.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Pattern Matching1. 2 Outline and Reading Strings (§9.1.1) Pattern matching algorithms Brute-force algorithm (§9.1.2) Boyer-Moore algorithm (§9.1.3) Knuth-Morris-Pratt.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Goodrich, Tamassia String Processing1 Pattern Matching.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32.
6-1 String Matching Learning Outcomes Students are able to: Explain naïve, Rabin-Karp, Knuth-Morris- Pratt algorithms Analyse the complexity of these algorithms.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 15 Instructor: Paul Beame.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 8 Tuesday, 11/13/01 String Matching Algorithms Chapter.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 2: KMP Algorithm Lecturer:
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Pattern Matching COMP171 Spring Pattern Matching / Slide 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences.
Pattern Matching1. 2 Outline Strings Pattern matching algorithms Brute-force algorithm Boyer-Moore algorithm Knuth-Morris-Pratt algorithm.
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
The Rabin-Karp Algorithm String Matching Jonathan M. Elchison 19 November 2004 CS-3410 Algorithms Dr. Shomper.
String Matching Using the Rabin-Karp Algorithm Katey Cruz CSC 252: Algorithms Smith College
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Oct.
CSC401 – Analysis of Algorithms Chapter 9 Text Processing
Advanced Algorithm Design and Analysis (Lecture 3) SW5 fall 2004 Simonas Šaltenis E1-215b
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.
20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Oct.
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
1 String Matching Algorithms Topics  Basics of Strings  Brute-force String Matcher  Rabin-Karp String Matching Algorithm  KMP Algorithm.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov 2004.
Bahareh Sarrafzadeh 6111 Fall 2009
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Fundamental Data Structures and Algorithms
String-Matching Problem COSC Advanced Algorithm Analysis and Design
Compression techniques Adaptive and non-adaptive.
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
1 String Matching Algorithms Mohd. Fahim Lecturer Department of Computer Engineering Faculty of Engineering and Technology Jamia Millia Islamia New Delhi,
Advanced Algorithms Analysis and Design
Advanced Algorithms Analysis and Design
Applied Algorithmics - week7
The Greedy Method and Text Compression
Advanced Algorithm Design and Analysis (Lecture 12)
13 Text Processing Hongfei Yan June 1, 2016.
Chapter 8 – Binary Search Tree
Tuesday, 12/3/02 String Matching Algorithms Chapter 32
String-Matching Algorithms (UNIT-5)
Advanced Algorithms Analysis and Design
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Sequences 5/17/ :43 AM Pattern Matching.
Huffman Coding Greedy Algorithm
Presentation transcript:

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Oct rd Lecture Christian Schindelhauer

Search Algorithms, WS 2004/05 2 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Chapter I Searching Text 18 Oct 2004

Search Algorithms, WS 2004/05 3 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Searching Text (Overview)  The task of string matching –Easy as a pie  The naive algorithm –How would you do it?  The Rabin-Karp algorithm –Ingenious use of primes and number theory  The Knuth-Morris-Pratt algorithm –Let a (finite) automaton do the job –This is optimal  The Boyer-Moore algorithm –Bad letters allow us to jump through the text –This is even better than optimal (in practice)  Literature –Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989,

Search Algorithms, WS 2004/05 4 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Naive Algorithm Naive-String-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.for s  0 to n-m do 4. if P[1..m] = T[s+1.. s+m] then 5. return “Pattern occurs with shift s” 6.fi 7.od Fact:  The naive string matcher needs worst case running time O((n-m+1) m)  For n = 2m this is O(n 2 )  The naive string matcher is not optimal, since string matching can be done in time O(m + n)

Search Algorithms, WS 2004/05 5 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp-Algorithm  Idea: Compute –checksum for pattern P and –checksum for each sub-string of T of length m amnmaaanptaiiptpii ptai 3 valid hit spurious hit checksums checksum

Search Algorithms, WS 2004/05 6 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Finite-Automaton-Matcher  The example automaton accepts at the end of occurences of the pattern abba  For every pattern of length m there exists an automaton with m+1 states that solves the pattern matching problem with the following algorithm: Finite-Automaton-Matcher(T, ,P) 1.n  length(T) 2.q  0 3.for i  1 to n do 4. q   (q,T[i]) 5. if q = m then 6. s  i - m 7. return “Pattern occurs with shift” s 8.fi 9.od

Search Algorithms, WS 2004/05 7 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Finite-Automaton-Matcher Q is a finite set of states q 0  Q is the start state Q is a set of accepting sates  : input alphabet  : Q    Q: transition function a b b a a b b b a a input state ab aabbababab

Search Algorithms, WS 2004/05 8 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt Pattern Matching KMP-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.   Compute-Prefix-Function(P) 4.q  0 5.for i  1 to n do 6. while q > 0 and P[q+1]  T[i] do 7. q   [q] od 8. if P[q+1] = T[i] then 9. q  q+1 fi 10. if q = m then 11. print “Pattern occurs with shift”i-m 12. q   [q] fi od amnmaaampa m m m a m a ma ma m m a m m mma mma m m Pattern mmaa 

Search Algorithms, WS 2004/05 9 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore: The ideas! amnmaaanptaiiptpii ptii ptii Start comparing at the end What’s this? There is no “a” in the search pattern We can shift m+1 letters An “a” again... ptii First wrong letter! Do a large shift! ptii Bingo! Do another large shift! ptii That’s it! 10 letters compared and ready!

Search Algorithms, WS 2004/05 10 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore-Matcher(T,P,  ) 1.n  length(T) 2.m  length(P) 3.  Compute-Last-Occurence-Function(P,m,  ) 4.   Compute-Good-Suffix(P,m) 5.s  0 6.while s  n-m do 7. j  m 8. while j > 0 and P[j] = T[s+j] do 9. j  j-1 od 10. if j=0 then 11.print “Pattern occurs with shift” s 12. s  s+  [0] else 13. s  s+ max(  [j], j - [T[s+j]] ) fi od We start comparing at the right end Bad character shift Valid shifts Success! Now do a valid shift Shift as far as possible indicated by bad character heuristic or good suffix heuristic

Search Algorithms, WS 2004/05 11 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore: Last-occurrence amnmaatnptaiiptpii ptii ptii What’s this? There is no “a” in the search pattern We can shift by j - [a] = 4-0 letters “t” occurs in “piti” at the 3rd position: Shift by j - [a] = 4-3 = one step ptii “p” occurs in “piti” at the first position Shift by j - [a] = 4-1 = 3 letters ptii There is no “a” in the search pattern We can shift by at least j - [a] = 2-0 letters j=4 j=2

Search Algorithms, WS 2004/05 12 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Last-Occurrence-Function(P,m,  ) 1.for each character a   do 2. [a]  0 od 3.for j  1 to m do 4. [P[j]]  j od 5.return Running time: O(|  | + m) ptii a i p t

Search Algorithms, WS 2004/05 13 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Prefix Function  [q] := max {k : k < q and P k is a suffix of P q } baabaaaaa baabaaaaa  [7] = 4 baabaaaa b a baabaaaa baabaaa baabaaa baabaa baaba P8P8 P7bP7b P7P7 P6P6 P5P5 Text Pattern

Search Algorithms, WS 2004/05 14 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer a baaba  [q] := max {k : k < q and P k is a suffix of P q } Pattern: baabaa  [6] = 3 baaa baa  [4] = 1 baaba baaa  [5] = 2 a  [1] = 0 ba a  [2] = 0 baa ba  [3] = 1 baabaaaaa baabaaa baabaa  [7] = 4 baabaaaa baabaaa  [8] = 1 baabaaaa baaba a a  [9] = 1 a

Search Algorithms, WS 2004/05 15 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing  Compute-Prefix-Function(P) 1.m  length(P) 2.  [1]  0 3.k  0 4.for q  2 to m do 5. while k > 0 and P[k+1]  P[q] do 6. k   [k] od 7. if P[k+1] = P[q] then 8. k  k+1 fi 9.  [q]  k od If P k+1 is not a suffix of P q... shift the pattern to the next reasonable position (given by smaller values of  ) If the letter fits, then increment position (otherwise k = 0) We have found the position such that  [q] := max {k : k < q and P k is a suffix of P q }

Search Algorithms, WS 2004/05 16 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer n Boyer-Moore: Good Suffix - the far jump ammaaan Pattern: First mismatch maaan maaan namaaan nammaaan nammaaan nammaaan nammaaan nammaaan nam nam m Is Rev(P) 5 a suffix of Rev(P) 6 ? Is Rev(P) 5 a suffix of Rev(P) 7 ? Is Rev(P) 5 a suffix of Rev(P) 8 ? (or P 5 a suffix of P 8 )? Is P 4 a suffix of P 8 ? Is P 3 a suffix of P 8 ? Is P 2 a suffix of P 8 ? Is P 1 a suffix of P 8 ? Is P 0 a suffix of P 8 ?  [q] := max {k : k < q and P k is a suffix of P q }  [8]=4 Shift =m-  [j] =8-4 =4 j=6

Search Algorithms, WS 2004/05 17 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer m Boyer-Moore: Good Suffix - the small jump ammaaam Pattern: First mismatch maaan maaam mamaaam mammaaam mammaaam mammaaam mammaaam mammaaam mam mam m Is P 4 a suffix of P 8 ? Is P 3 a suffix of P 8 ? Is P 2 a suffix of P 8 ? Is P 1 a suffix of P 8 ? Is P 0 a suffix of P 8 ? f[6]=8 Shift (f[j]-j)=8-6=2 j=6 f[j] := min{k : k > j and Rev(P) j is a suffix of Rev(P) k }  ’[q] := max {k : k < q and Rev(P) k is a suffix of Rev(P) q } Is Rev(P) 5 a suffix of Rev(P) 6 ? Is Rev(P) 5 a suffix of Rev(P) 7 ? Is Rev(P) 5 a suffix of Rev(P) 8 ? (or P 5 a suffix of P 8 )?

Search Algorithms, WS 2004/05 18 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore: Good Suffix - the small jump Pattern: j=6 f[6]=8 Shift (f[j]-j)=8-6=2 f[j] := min{k : k > j and Rev(P) j is a suffix of Rev(P) k }  ’[q] := max {k : k < q and Rev(P) k is a suffix of Rev(P) q }

Search Algorithms, WS 2004/05 19 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Why is it the same?  ’[k] := max {j : j < k and Rev(P) j is a suffix of Rev(P) k } Matrix for Rev(P) j is a suffix of Rev(P) k k j f[j] := min{k : k > j and Rev(P) j is a suffix of Rev(P) k }

Search Algorithms, WS 2004/05 20 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Compute-Good-Suffix-Function(P,m) 1.   Compute-Prefix-Function(P) 2.P’  reverse(P) 3.  ’  Compute-Prefix-Function(P’) 4.for j  0 to m do 5.  [j]  m -  [m] od 6.for l  1 to m do 7. j  m -  ’[l] 8. if  [j] > l -  ’[l] then 9.  [j]  l -  ’[l] fi od 10.return  Running time: O(m) The far jump or is it a small jump

Search Algorithms, WS 2004/05 21 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Boyer-Moore-Matcher(T,P,  ) 1.n  length(T) 2.m  length(P) 3.  Compute-Last-Occurence-Function(P,m,  ) 4.   Compute-Good-Suffix(P,m) 5.s  0 6.while s  n-m do 7. j  m 8. while j > 0 and P[j] = T[s+j] do 9. j  j-1 od 10. if j=0 then 11. print “Pattern occurs with shift” s 12. s  s+  [0] else 13. s  s+ max(  [j], j - [T[s+j]] ) fi od  Running time: O((n-m+1)m) in the worst case  In practice: O(n/m + v m + m + |  |)  for v hits in the text

Search Algorithms, WS 2004/05 22 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Chapter II Searching in Compressed Text 25 Oct 2004

Search Algorithms, WS 2004/05 23 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Searching in Compressed Text (Overview)  What is Text Compression –Definition –The Shannon Bound –Huffman Codes –The Kolmogorov Measure  Searching in Non-adaptive Codes –KMP in Huffman Codes  Searching in Adaptive Codes –The Lempel-Ziv Codes –Pattern Matching in Z-Compressed Files –Adapting Compression for Searching

Search Algorithms, WS 2004/05 24 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer What is Text Compression?  First approach: –Given a text s   n –Find a compressed version c   m such that m < n –Such that s can be derived from c  Formal: –Compression Function f :  *   * is one-to-one (injective) and efficiently invertible  Fact: –Most of all text is uncompressible  Proof: –There are (|  | m+1 -1)/(|  |-1) strings of length at most m –There are |  | n strings of length n –From these strings at most (|  | m+1 -1)/(|  |-1) strings can be compressed –This is fraction of at most |  | m-n+1 /(|  |-1) –E.g. for |  | = 256 and m=n-10 this is 8.3 × which implies that only 8.3 × of all files of n bytes can be compressed to a string of length n-10

Search Algorithms, WS 2004/05 25 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Why does Text Compression work?  Usually texts are using letters with different frequencies –Relative Frequencies of Letters in General English Plain text From Cryptographical Mathematics, by Robert Edward Lewand: e: 12%, t: 10%, a: 8%, i: 7%, n: 7%, o: 7%... k: 0.4%, x: 0. 2%, j: 0. 2%, q: 0. 09%, z:0. 06% –Special characters like $,%,# occur even less frequent –Some character encodings are (nearly) unused, e.g. bytecode: 0 of ASCII  Text underlies a lot of rules –Words are (usually) the same (collected in dictionaries) –Not all words can be used in combination –Sentences are structured (grammar) –Program codes use code words –Digitally encoded pictures have smooth areas, where colors change gradually –Patterns repeat

Search Algorithms, WS 2004/05 26 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Information Theory: The Shannon bound  C. E. Shannon in his 1949 paper "A Mathematical Theory of Communication".  Shannon derives his definition of entropy  The entropy rate of a data source means the average number of bits per symbol needed to encode it.  Example text: ababababab –Entropy: 1 –Encoding: Use 0 for a Use 1 for b –Code:  Huffman Codes are a way to derive such a Shannon bound (for sufficiently large text)

Search Algorithms, WS 2004/05 27 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Huffman Code  Huffman Code –is adapted for each text (but not within the text) –consists of a dictionary, which maps each letter of a text to a binary string and the code given as a prefix-free binary encoding  Prefix-free code –uses strings s 1,s 2,...,s m of variable length such that no strint s i is a prefix of s j amnmaaamp iipt LetterFrequencyCode a510 i401 p3111 m2000 t2001 n2110  Example of Huffman encoding: –Text: iipt a amnmaaamp iiptiipt a Encoding:

Search Algorithms, WS 2004/05 28 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Computing Huffman Codes  Compute the letter frequencies  Build root nodes labeled with frequencies  repeat –Build node connected the two least frequent unlinked nodes –Mark sons with 0 and 1 –Father node carries the sum of the frequencies  until one tree is left  The path to each letter carries the code LetterFrequency a5 i4 p3 m2 t2 n2 ainpmt LetterCode a10 i01 p111 m000 t001 n110

Search Algorithms, WS 2004/05 29 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Searching in Huffman Codes  Let u be the size of the compressed text  Let v be the size of the pattern Huffman-encoded according to the text dictionary  KMP can search in Huffman Codes in time O(u+v+m)  Encoding the pattern takes O(v+m) steps  Building the prefix takes time O(v)  Searching the text on a bit level takes time O(u+v)  Problems: –This algorithm is bit-wise not byte-wise Exercise: Develop a byte-wise strategy

Search Algorithms, WS 2004/05 30 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Downside of Huffman Codes  Example: Consider 128 Byte text: –abbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabba abbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabba –will be encoded using 16 Bytes (and an extra byte for the dictionary) as – –This does not use the full compression possibilities for this text –E.g. using (abba)^32 would need only 9 Bytes  The perfect code: –A self-extracting program for a string x is a program that started without input produces the output x and then halts. –So, the smallest self-extracting-program is the ultimate encoding  Kolmogorov complexity K(x) of a string x denotes the length of such an self-extracting program for x

Search Algorithms, WS 2004/05 31 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Kolmogoroff Complexity  Is the Kolmogorov Complexity depending on programming language? –No, as long as the programming language is universal, e.g. can simulate any Turing machine Lemma Let K 1 (x) and K 2 (x) denote the Kolmogorov Complexity with respect to two arbitrary universal programming languages.Then for a constant c and all strings x: K 1 (x)  K 2 (x) + c  Is the Kolmogorov Complexity useful? –No: Theorem K(x) is not recursive.

Search Algorithms, WS 2004/05 32 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Proof of Lemma Lemma Let K 1 (x) and K 2 (x) denote the Kolmogorov Complexity with respect to two arbitrary universal programming languages.Then for a constant c and all strings x: K 1 (x)  K 2 (x) + c Proof  Let M 1 be the self-extracting program for x with respect to the first language  Let U be a universal program in the seconde that simulates a given machine M 1 of the first language  The output of U(M 1,  ) is x  Then, the can find a machine M 2 of length |U|+|M 1 |+O(1) that has the same functionality as U(M 1,  ) –by using S-m-n-Theorem  Since |U| is a fixed (constant-sized) machine this proves the statement.

Search Algorithms, WS 2004/05 33 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Proof of the Theorem Theorem K(x) is not recursive. Proof  Assume K(x) is recursive.  For string length n let x n denote the smallest string of length n such that K(x)  |x| = n  We can enumerate x n –Compute for all strings x of size n the Kolmogorov complexity K(x) and output the first string x with K(x)  n  Let M be the program computing x n on input n  We can efficiently encode x n : –Combine M with binary encoded n: K(x)  log n + |M| = log n + O(1)  For large enough n this is a contradiction to K(x)  n

34 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Thanks for your attention End of 3rd lecture Next lecture:Mo 8 Nov 2004, am, FU 116 Next exercise class: Mo 25 Oct 2004, 1.15 pm, F0.530 or We 27 Oct 2004, 1.00 pm, E2.316