Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea Trees Applications.

Similar presentations


Presentation on theme: "CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea Trees Applications."— Presentation transcript:

1 CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Trees Applications

2 Trees: A Review (again? ) General trees one parent, N children Binary tree ISA General tree + max 2 children Binary search tree ISA Binary tree + left subtree < parent < right subtree AVL tree ISA Binary search tree + | height left subtree – height right subtree |  1

3 Trees: A Review (cont’d) Multi-way search tree ISA General tree + Each node has K keys and K+1 children + All keys in child K < key K < all keys in child K+1 2-4 Tree ISA Multi-way search tree + All nodes have at most 3 keys / 4 children + All leaves are at the same level B-Tree ISA Multi-way search tree + All nodes have at least T keys, at most 2T(+1) keys + All leaves are at the same level

4 Tree Applications Data Compression Huffman tree Automatic Learning Decision trees

5 Huffman code Very often used for text compression Do you know how gzip or winzip works?  Compression methods ASCII code uses codes of equal length for all letters  how many codes? Today’s alternative to ASCII? Idea behind Huffman code: use shorter length codes for letters that are more frequent

6 Huffman Code Build a list of letters and frequencies “have a great day today” Build a Huffman Tree bottom up, by grouping letters with smaller occurrence frequencies

7 Huffman Codes Write the Huffman codes for the strings “abracadabra” “Veni Vidi Vici”

8 Huffman Code Running time? Suppose N letters in input string, with L unique letters What is the most important factor for obtaining highest compression? Compare: [assume a text with a total of 1000 characters] I. Three different characters, each occurring the same number of times II. 20 different characters, 19 of them occurring only once, and the 20st occurring the rest of the time

9 One More Application Heuristic Search Decision Trees Given a set of examples, with an associated decision (e.g. good/bad, +/-, pass/fail, caseI/caseII/caseIII, etc.) Attempt to take (automatically) a decision when a new example is presented Predict the behavior in new cases!

10 Data Records NameA B C D E F G 1. Jeffrey B.1 0 1 0 1 0 1 - 2. Paul S.0 1 1 0 0 0 1 - 3. Daniel C.0 0 1 0 0 0 0 - 4. Gregory P.1 0 1 0 1 0 0 - 5. Michael N.0 0 1 1 0 0 0 - 6. Corinne N.1 1 1 0 1 0 1 + 7. Mariyam M.0 1 0 1 0 0 1 + 8. Stephany D.1 1 1 1 1 1 1 + 9. Mary D.1 1 1 1 1 1 1 + 10. Jamie F.1 1 1 0 0 1 1 +

11 Fields in the Record A: First name ends in a vowel? B: Neat handwriting? C: Middle name listed? D: Senior? E: Got extra-extra credit? F: Google brings up home page? G: Google brings up reference?

12 Build a Classification Tree Internal nodes: features Leaves: classification F AD A 0 1 8,9 2,3,71,4,5,610 Error: 30%

13 Different Search Problem Given a set of data records with their classifications, pick a decision tree: search problem! Challenges: Scoring function? Large space of trees. What’s a good tree? Low error on given set of records Small

14 “Perfect” Decision Tree C E B 0 1 F middle name? EEC? Neat? Google? Training set Error: 0% (can always do this?) 0 00 1 11

15 Search For a Classification Classify new records New1. Mike M.1 0 1 1 0 0 1 ? New2. Jerry K. 0 1 0 1 0 0 0 ?

16 The very last tree for this class


Download ppt "CSCE 3110 Data Structures & Algorithm Analysis Rada Mihalcea Trees Applications."

Similar presentations


Ads by Google