CompSci 100 9.1 Binary Trees l Linked lists: efficient insertion/deletion, inefficient search  ArrayList: search can be efficient, insertion/deletion.

Slides:



Advertisements
Similar presentations
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Advertisements

1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
04 Trees Part I CS 310 photo ©Oregon Scenics used with permissionOregon Scenics All figures labeled with “Figure X.Y” Copyright © 2006 Pearson Addison-Wesley.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 20: Binary Trees.
1 Joe Meehean.  Important and common problem  Given a collection, determine whether value v is a member  Common variation given a collection of unique.
Version TCSS 342, Winter 2006 Lecture Notes Trees Binary Trees Binary Search Trees.
CS 146: Data Structures and Algorithms June 18 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Lecture Objectives  To learn how to use a tree to represent a hierarchical organization of information  To learn how to use recursion to process trees.
CPS Wordcounting, sets, and the web l How does Google store all web pages that include a reference to “peanut butter with mustard”  What efficiency.
1 COP 3538 Data Structures with OOP Chapter 8 - Part 2 Binary Trees.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
CPS 100, Spring Trees: no data structure lovelier?
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
S EARCHING AND T REES COMP1927 Computing 15s1 Sedgewick Chapters 5, 12.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
CompSci 100e Program Design and Analysis II March 29, 2011 Prof. Rodger CompSci 100e, Spring20111.
Chapter 19: Binary Trees Java Programming: Program Design Including Data Structures Program Design Including Data Structures.
CompSci 100e 6.1 Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a.
CPS Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
CompSci 100e 7.1 From doubly-linked lists to binary trees l Instead of using prev and next to point to a linear arrangement, use them to divide the universe.
CPS Balanced Search Trees l BST: efficient lookup, insertion, deletion  Average case: O(log n) for all operations since find is O(log n) [complexity.
Hashing as a Dictionary Implementation Chapter 19.
CompSci 100e 7.1 Plan for the week l Review:  Union-Find l Understand linked lists from the bottom up and top-down  As clients of java.util.LinkedList.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CPS Scoreboard l What else might we want to do with a data structure? l What are maps’ limitations? AlgorithmInsertionDeletionSearch Unsorted.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Binary Search Trees (BST)
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Data Structures Using C++ 2E Chapter 11 Binary Trees.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
1 CSE 326: Data Structures Trees Lecture 6: Friday, Jan 17, 2003.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CPS Searching, Maps, Tables l Searching is a fundamentally important operation ä We want to do these operations quickly ä Consider searching using.
CompSci 100e 7.1 Plan for the week l Review:  Boggle  Coding standards and inheritance l Understand linked lists from the bottom up and top-down  As.
CPS Searching, Maps, Tables (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider.
Searching, Maps,Tries (hashing)
Searching, Maps,Tries (hashing)
From doubly-linked lists to binary trees
PFTFBH Binary Search trees Fundamental Data Structure
Binary Trees Linked lists: efficient insertion/deletion, inefficient search ArrayList: search can be efficient, insertion/deletion not Binary trees: efficient.
Binary Trees Linked lists: efficient insertion/deletion, inefficient search ArrayList: search can be efficient, insertion/deletion not Binary trees: efficient.
PFTWBH More examples of recursion and alternatives
What’s the Difference Here?
abstract containers sequence/linear (1 to 1) hierarchical (1 to many)
Binary Trees, Binary Search Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Binary Trees, Binary Search Trees
Advanced Implementation of Tables
Binary Trees, Binary Search Trees
Data Structures Using C++ 2E
Presentation transcript:

CompSci Binary Trees l Linked lists: efficient insertion/deletion, inefficient search  ArrayList: search can be efficient, insertion/deletion not l Binary trees: efficient insertion, deletion, and search  trees used in many contexts, not just for searching, e.g., expression trees  search in O(log n) like sorted array  insertion/deletion O(1) like list, once location found!  binary trees are inherently recursive, difficult to process trees non-recursively, but possible recursion never required, often makes coding simpler

CompSci From doubly-linked lists to binary trees l Instead of using prev and next to point to a linear arrangement, use them to divide the universe in half  Similar to binary search, everything less goes left, everything greater goes right  How do we search?  How do we insert? “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ” “pig” “hippo” “leopard” “koala”

CompSci Basic tree definitions l Binary tree is a structure:  empty  root node with left and right subtrees l terminology: parent, children, leaf node, internal node, depth, height, path link from node N to M then N is parent of M –M is child of N leaf node has no children –internal node has 1 or 2 children path is sequence of nodes, N 1, N 2, … N k –N i is parent of N i+1 –sometimes edge instead of node depth (level) of node: length of root-to-node path –level of root is 1 (measured in nodes) height of node: length of longest node-to-leaf path –height of tree is height of root A B ED F C G

CompSci A TreeNode by any other name… l What does this look like?  What does the picture look like? public class TreeNode { TreeNode left; TreeNode right; String info; TreeNode(String s, TreeNode llink, TreeNode rlink){ info = s; left = llink; right = rlink; } “llama” “tiger” “ giraffe ”

CompSci Printing a search tree in order l When is root printed?  After left subtree, before right subtree. void visit(TreeNode t) { if (t != null) { visit(t.left); System.out.println(t.info); visit(t.right); } l Inorder traversal “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ” “pig” “hippo” “leopard”

CompSci Insertion and Find? Complexity? l How do we search for a value in a tree, starting at root?  Can do this both iteratively and recursively, contrast to printing which is very difficult to do iteratively  How is insertion similar to search? l What is complexity of print? Of insertion?  Is there a worst case for trees?  Do we use best case? Worst case? Average case? l How do we define worst and average cases  For trees? For vectors? For linked lists? For arrays of linked-lists?

CompSci See SetTiming code What about ISimpleSet interface  How does this compare to java.util?  Why are we looking at this, what about Java source? l How would we implement most simply?  What are complexity repercussions: add, contains  What about iterating? l What would linked list get us? Scenarios where better?  Consider N adds and M contains operations  Move to front heuristic?

CompSci What does contains look like? public boolean contains(E element) { return myList.indexOf(element) >= 0; } public boolean contains(E element){ returns contains(myHead, element); } private boolean contains(Node list, E element) { if (list == null) return false; if (list.info.equals(element)) return true; return contains(list.next,element); } l Why is there a private, helper method?  What will be different about Tree?

CompSci What does contains look like? public boolean contains(E element){ returns contains(myRoot, element); } private boolean contains(TreeNode root, E element) { if (root == null) return false; if (list.info.equals(element)) return true; if (element.compareTo(root.info) <= 0){ return contains(root.left,element); else return contains(root.right,element); } l What is recurrence? Complexity?  When good trees go bad, how can this happen?

CompSci What does insertion look like? l Simple recursive insertion into tree (accessed by root)  root = insert("foo", root); public TreeNode insert(TreeNode t, String s) { if (t == null) t = new Tree(s,null,null); else if (s.compareTo(t.info) <= 0) t.left = insert(t.left,s); else t.right = insert(t.right,s); return t; } l Note: in each recursive call, the parameter t in the called clone is either the left or right pointer of some node in the original tree  Why is this important?  Why must the idiom t = treeMethod(t,…) be used?

CompSci Removal from tree? l For insertion we can use iteration (see BSTSet)  Look below, either left or right If null, stop and add Otherwise go left when l Removal is tricky, depends on number of children  Straightforward when zero or one child  Complicated when two children, find successor See set code for complete cases If right child, straightforward Otherwise find node that’s left child of its parent (why?)

CompSci Implementing binary trees l Trees can have many shapes: short/bushy, long/stringy  if height is h, number of nodes is between h and 2 h -1  single node tree: height = 1, if height = 3  Java implementation, similar to doubly-linked list public class Tree { String info; TreeNode left; TreeNode right; TreeNode(String s, TreeNode llink, TreeNode rlink){ info = s; left = llink; right = rlink; } };

CompSci Tree functions l Compute height of a tree, what is complexity? int height(Tree root) { if (root == null) return 0; else { return 1 + Math.max(height(root.left), height(root.right) ); } l Modify function to compute number of nodes in a tree, does complexity change?  What about computing number of leaf nodes?

CompSci Tree traversals l Different traversals useful in different contexts  Inorder prints search tree in order Visit left-subtree, process root, visit right-subtree  Preorder useful for reading/writing trees Process root, visit left-subtree, visit right-subtree  Postorder useful for destroying trees Visit left-subtree, visit right-subtree, process root “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ”

CompSci Balanced Trees and Complexity l A tree is height-balanced if  Left and right subtrees are height-balanced  Left and right heights differ by at most one boolean isBalanced(Tree root) { if (root == null) return true; return isBalanced(root.left) && isBalanced(root.right) && Math.abs(height(root.left) – height(root.right)) <= 1; }

CompSci What is complexity? l Assume trees are “balanced” in analyzing complexity  Roughly half the nodes in each subtree  Leads to easier analysis l How to develop recurrence relation?  What is T(n)?  What other work is done? l How to solve recurrence relation  Plug, expand, plug, expand, find pattern  A real proof requires induction to verify correctness

CompSci Danny Hillis l The third culture consists of those scientists and other thinkers in the empirical world who, through their work and expository writing, are taking the place of the traditional intellectual in rendering visible the deeper meanings of our lives, redefining who and what we are. (Wired 1998) And now we are beginning to depend on computers to help us evolve new computers that let us produce things of much greater complexity. Yet we don't quite understand the process - it's getting ahead of us. We're now using programs to make much faster computers so the process can run much faster. That's what's so confusing - technologies are feeding back on themselves; we're taking off. We're at that point analogous to when single-celled organisms were turning into multicelled organisms. We are amoebas and we can't figure out what the hell this thing is that we're creating.

CompSci Searching, Maps,Tries (hashing) l Searching is a fundamentally important operation  We want to search quickly, very very quickly  Consider searching using Google, ACES, issues?  In general we want to search in a collection for a key We've searched using trees and arrays  Tree implementation was quick: O(log n) worst/average?  Arrays: access is O(1), search is slower l If we compare keys, log n is best for searching n elements  Lower bound is  (log n), provable  Hashing is O(1) on average, not a contradiction, why?  Tries are O(1) worst-case!! (ignoring length of key)

CompSci From Google to Maps l If we wanted to write a search engine we’d need to access lots of pages and keep lots of data  Given a word, on what pages does it appear?  This is a map of words->web pages l In general a map associates a key with a value  Look up the key in the map, get the value  Google: key is word/words, value is list of web pages  Anagram: key is string, value is words that are anagrams l Interface issues  Lookup a key, return boolean: in map or value: associated with the key (what if key not in map?)  Insert a key/value pair into the map

CompSci Interface at work: MapDemo.java l Key is a string, Value is # occurrences  Code below shows how Map interface/classes work while (it.hasNext()) { String s = it.next();. Counter c = map.get(s); if (c != null) c.increment(); else map.put(s, new Counter()); } What clues are there for prototype of map.get and map.put ?  What if a key is not in map, what value returned?  What kind of objects can be put in a map?

CompSci Replacing Counter with Integer l With autoboxing (and unboxing) do we need class Counter?  What if we access a key that’s not there? while (it.hasNext()) { String s = it.next();. if (map.containsKey(s)){ map.put(s, map.get(s)+1); } else map.put(s,1); } l What is key? What is value?  What if a key is not in map, what value returned?  Is use of get() to determine if key is present a good idea?

CompSci Getting keys and values from a map l Access every key in the map, then get the corresponding value  Get an iterator of the set of keys: keySet().iterator()  For each key returned by this iterator call map.get(key) … Get an iterator over (key,value) pairs, there's a nested class called Map.Entry that the iterator returns, accessing the key and the value separately is then possible  To see all the pairs use entrySet().iterator()

CompSci Margo Seltzer Herchel Smith Professor of Computer Science and Associate Dean at Harvard, CTO of Sleepycat Software “Computer Science is the ultimate combination of Engineering and Science. It provides the tangible satisfaction that comes from building things and seeing them run with the thrill of discovery that is the hallmark of scientific exploration. … I want to … work with people, and help young people become skilled computer scientists and engineers.” “I have always maintained an active life outside work, playing soccer, studying karate, spending time with my family, and socializing as much as my schedule allows. When you have a job, you have plenty of time for these activities, but when you have a career, you have to make the time, and it's an effort well worth making. “

CompSci External Iterator without generics l The Iterator interface access elements  Source of iterator makes a difference: cast required? Iterator it = map.keySet().iterator(); while (it.hasNext()){ Object value = map.get(it.next()); } Iterator it2 = map.entrySet().iterator(); while (it2.hasNext()){ Map.Entry me = (Map.Entry) it.next(); Object value = me.getValue(); }

CompSci External Iterator with generics l Avoid Object, we know what we have a map of  Is the syntax worth it? Iterator it = map.keySet().iterator(); while (it.hasNext()){ Counter value = map.get(it.next()); } Iterator > it2 = map.entrySet().iterator(); while (it2.hasNext()){ Map.Entry me = it.next(); Counter value = me.getValue(); }

CompSci Hashing: Log ( ) is a big number l Comparison based searches are too slow for lots of data  How many comparisons needed for a billion elements?  What if one billion web-pages indexed? l Hashing is a search method: average case O(1) search  Worst case is very bad, but in practice hashing is good  Associate a number with every key, use the number to store the key Like catalog in library, given book title, find the book l A hash function generates the number from the key  Goal: Efficient to calculate  Goal: Distributes keys evenly in hash table

CompSci Hashing details l There will be collisions, two keys will hash to the same value  We must handle collisions, still have efficient search  What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? l Several ways to handle collisions, in general array/vector used  Linear probing, look in next spot if not found Hash to index h, try h+1, h+2, …, wrap at end Clustering problems, deletion problems, growing problems  Quadratic probing Hash to index h, try h+1 2, h+2 2, h+3 2, …, wrap at end Fewer clustering problems  Double hashing Hash to index h, with another hash function to j Try h, h+j, h+2j, … n-1

CompSci Chaining with hashing l With n buckets each bucket stores linked list  Compute hash value h, look up key in linked list table[h]  Hopefully linked lists are short, searching is fast  Unsuccessful searches often faster than successful Empty linked lists searched more quickly than non-empty  Potential problems? l Hash table details  Size of hash table should be a prime number  Keep load factor small: number of keys/size of table  On average, with reasonable load factor, search is O(1)  What if load factor gets too high? Rehash or other method

CompSci Hashing problems l Linear probing, hash(x) = x, (mod tablesize)  Insert 24, 12, 45, 14, delete 24, insert 23 (where?) l Same numbers, use quadratic probing (clustering better?) l What about chaining, what happens?

CompSci What about hash functions l Hashing often done on strings, consider two alternatives public static int hash(String s) { int k, total = 0; for(k=0; k < s.length(); k++){ total += s.charAt(k); } return total; } Consider total += (k+1)*s.charAt(k), why might this be better?  Other functions used, always mod result by table size l What about hashing other objects?  Need conversion of key to index, not always simple  Ever object has method hashCode()!

CompSci Trie: efficient search words/suffixes l A trie (from retrieval, but pronounced “try”) supports  Insertion: put string into trie (delete and look up)  These operations are O(size of string) regardless of how many strings are stored in the trie! Guaranteed ! l In some ways a trie is like a 128 (or 26 or alphabet-size) tree, one branch/edge for each character/letter  Node stores branches to other nodes  Node stores whether it ends the string from root to it l Extremely useful in DNA/string processing  Very useful matching suffixes: suffix tree/suffix array

CompSci Trie picture/code (see TrieSet.java) l To add string  Start at root, for each char create node as needed, go down tree, mark last node l To find string  Start at root, follow links If null, not found  Check word flag at end l To print all nodes  Visit every node, build string as nodes traversed l What about union and intersection,iteration? a c p r n s a r a h ao st cd Indicates word ends here

CompSci Guy L. Steele, Jr. Co-invented/developed Scheme, continues to develop Java If, several years ago, with C++ at its most popular, … you had come to me, O worthy opponents, and proclaimed that objects had failed, I might well have agreed. But now that Java has become mainstream, popularizing not only object- oriented programming but related technologies such as garbage collection and remote method invocation, … we may now confidently assert that objects most certainly have not failed.

CompSci RSS Aggregator l What is RSS (Really Simple Syndication)?  What problem does it try to solve? l We want to retrieve and index news articles  How will we search l What does an item have in it? Medicare Change Will Limit Access to Claim Hearing Medicare beneficiaries must now show special circumstances to appear in person before a judge when their claims are denied. By ROBERT PEAR Sun, 24 Apr :00:00 EDT

CompSci What does RSS look like? NYT > Home Page NYT: Breaking News … item node 1 …