CPS 100 5.1 Wordcounting, sets, and the web l How does Google store all web pages that include a reference to “peanut butter with mustard”  What efficiency.

Slides:



Advertisements
Similar presentations
Comp 245 Data Structures Trees. Introduction to the Tree ADT A tree is a non-linear structure. A treenode can point to 0 to N other nodes. There is one.
Advertisements

Binary Trees Chapter 6. Linked Lists Suck By now you realize that the title to this slide is true… By now you realize that the title to this slide is.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
CS 171: Introduction to Computer Science II
Trees, Binary Trees, and Binary Search Trees COMP171.
Data Structures Data Structures Topic #8. Today’s Agenda Continue Discussing Table Abstractions But, this time, let’s talk about them in terms of new.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Unit 11a 1 Unit 11: Data Structures & Complexity H We discuss in this unit Graphs and trees Binary search trees Hashing functions Recursive sorting: quicksort,
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
1 Joe Meehean.  Important and common problem  Given a collection, determine whether value v is a member  Common variation given a collection of unique.
Data Structures Using C++1 Chapter 11 Binary Trees.
Version TCSS 342, Winter 2006 Lecture Notes Trees Binary Trees Binary Search Trees.
SIGCSE Tradeoffs, intuition analysis, understanding big-Oh aka O-notation Owen Astrachan
CPS What’s easy, hard, first? l Always do the hard part first. If the hard part is impossible, why waste time on the easy part? Once the hard part.
CPS 100, Spring Analysis: Algorithms and Data Structures l We need a vocabulary to discuss performance and to reason about alternative algorithms.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
CPS 100, Spring Trees: no data structure lovelier?
S EARCHING AND T REES COMP1927 Computing 15s1 Sedgewick Chapters 5, 12.
1 CSE 1342 Programming Concepts Trees. 2 Basic Terminology Trees are made up of nodes and edges. A tree has a single node known as a root. –The root is.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
A Binary Search Tree Binary Search Trees.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
CompSci 100e Program Design and Analysis II March 29, 2011 Prof. Rodger CompSci 100e, Spring20111.
Chapter 19: Binary Trees Java Programming: Program Design Including Data Structures Program Design Including Data Structures.
Binary Trees Definition A binary tree is: (i) empty, or (ii) a node whose left and right children are binary trees typedef struct Node Node; struct Node.
Trees, Binary Trees, and Binary Search Trees COMP171.
CompSci 100e 7.1 From doubly-linked lists to binary trees l Instead of using prev and next to point to a linear arrangement, use them to divide the universe.
CompSci 100e 7.1 Plan for the week l Review:  Union-Find l Understand linked lists from the bottom up and top-down  As clients of java.util.LinkedList.
CPS From sets, toward maps, via big-Oh l Consider the MemberCheck weekly problem  Find elements in common to two vectors  Alphabetize/sort resulting.
CSC 143 Trees [Chapter 10].
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
CPS Scoreboard l What else might we want to do with a data structure? l What are maps’ limitations? AlgorithmInsertionDeletionSearch Unsorted.
David Stotts Computer Science Department UNC Chapel Hill.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
CMSC 341 Introduction to Trees. 2/21/20062 Tree ADT Tree definition –A tree is a set of nodes which may be empty –If not empty, then there is a distinguished.
Binary Search Trees (BST)
CompSci 100E 15.1 What is a Binary Search?  Magic!  Has been used a the basis for “magical” tricks  Find telephone number ( without computer ) in seconds.
Compsci 100, Fall Analysis: Algorithms and Data Structures l We need a vocabulary to discuss performance  Reason about alternative algorithms.
1 Joe Meehean. A A B B D D I I C C E E X X A A B B D D I I C C E E X X  Terminology each circle is a node pointers are edges topmost node is the root.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Search: Binary Search Trees Dr. Yingwu Zhu. Review: Linear Search Collection of data items to be searched is organized in a list x 1, x 2, … x n – Assume.
CPS Solving Problems Recursively l Recursion is an indispensable tool in a programmer’s toolkit  Allows many complex problems to be solved simply.
Trees CSIT 402 Data Structures II 1. 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
CPS 100, Spring Tools: Solve Computational Problems l Algorithmic techniques  Brute-force/exhaustive, greedy algorithms, dynamic programming,
CompSci 100e 7.1 Plan for the week l Recurrences  How do we analyze the run-time of recursive algorithms? l Setting up the Boggle assignment.
From doubly-linked lists to binary trees
Data Structure and Algorithms
Printing a search tree in order
Analysis: Algorithms and Data Structures
PFTFBH Binary Search trees Fundamental Data Structure
Binary Trees Linked lists: efficient insertion/deletion, inefficient search ArrayList: search can be efficient, insertion/deletion not Binary trees: efficient.
COSC160: Data Structures Binary Trees
Binary Trees Linked lists: efficient insertion/deletion, inefficient search ArrayList: search can be efficient, insertion/deletion not Binary trees: efficient.
What’s the Difference Here?
Tools: Solve Computational Problems
CMSC 341 Introduction to Trees.
COSC160: Data Structures Linked Lists
Binary Search Trees Why this is a useful data structure. Terminology
Binary Trees, Binary Search Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
common code “abstracted out” same code written several times: don’t!
Binary Trees, Binary Search Trees
Binary SearchTrees [CLRS] – Chap 12.
Binary Trees, Binary Search Trees
Data Structures Using C++ 2E
Presentation transcript:

CPS Wordcounting, sets, and the web l How does Google store all web pages that include a reference to “peanut butter with mustard”  What efficiency issues exist?  Why is Google different (better)? How do readsetvec.cpp and readsetlist.cpp differ?  Mechanisms for insertion and search  How do we discuss? How do we compare performance? l If we stick with linked lists, how can we improve search?  Where do we want to find things?  What do thumb indexes do in a dictionary? What about 256 different linked lists? readsetlist2.cpp

CPS Analysis: Algorithms and Data Structures l We need a vocabulary to discuss performance and to reason about alternative algorithms and implementations  What’s the best way to sort, why?  What’s the best way to search, why?  Which is better: readsetvec, readsetlist, readsetlist2, readsettree, readsetstl ? l We need both empirical tests and analytical/mathematical reasoning  Given two methods, which is better? Run them to check. 30 seconds vs. 3 seconds, easy. 5 hours vs. 2 minutes, harder  Which is better? Analyze them. Use mathematics to analyze the algorithm, the implementation is another matter

CPS Multiplying and adding big-Oh l Suppose we do a linear search then we do another one  What is the complexity?  If we do 100 linear searches?  If we do n searches on a vector of size n? l What if we do binary search followed by linear search?  What are big-Oh complexities? Sum?  What about 50 binary searches? What about n searches? l What is the number of elements in the list (1,2,2,3,3,3)?  What about (1,2,2, …, n,n,…,n)?  How can we reason about this?

CPS Helpful formulae l We always mean base 2 unless otherwise stated  What is log(1024)?  log(xy) log(x y ) log(2 n ) 2 (log n) l Sums (also, use sigma notation when possible)  … + 2 k = 2 k+1 – 1 =  … + n = n(n+1)/2 =  a + ar + ar 2 + … + ar n-1 = a(r n - 1)/(r-1)= log(x) + log(y) y log(x) nlog(2) = n 2 (log n) = n k  i=0 2i2i n  i=1 i n-1  i=0 ar i

CPS Different measures of complexity l Worst case  Gives a good upper-bound on behavior  Never get worse than this  Drawbacks? l Average case  What does average mean?  Averaged over all inputs? Assuming uniformly distributed random data?  Drawbacks? l Best case  Linear search, useful?

CPS Recurrences l Counting nodes int length(Node * list) { if (0 == list) return 0; else return 1 + length(list->next); } l What is complexity? justification? l T(n) = time to compute length for an n-node list T(n) = T(n-1) + 1 T(0) = 1 l instead of 1, use O(1) for constant time  independent of n, the measure of problem size

CPS Solving recurrence relations l plug, simplify, reduce, guess, verify? T(n) = T(n-1) + 1 T(0) = 1 T(n) = T(n-k) + k find the pattern! Now, let k=n, then T(n) = T(0)+n = 1+n l get to base case, solve the recurrence: O(n) T(n-1) = T(n-1-1) + 1 T(n) = [T(n-2) + 1] + 1 = T(n-2)+2 T(n-2) = T(n-2-1) + 1 T(n) = [(T(n-3) + 1) + 1] + 1 = T(n-3)+3

CPS Consider merge sort for linked lists l Given a linked list, we want to sort it  Divide the list into two equal halves  Sort the halves  Merge the sorted halves together l What’s complexity of dividing an n-node in half?  How do we do this? l What’s complexity of merging (zipping) two sorted lists?  How do we do this? T(n) = time to sort n-node list = 2 T(n/2) + O(n) why?

CPS sidebar: solving recurrence T(n) = 2T(n/2) + n T(XXX) = 2T(XXX/2) + XXX T(1) = 1 (note: n/2/2 = n/4) T(n) = 2[2T(n/4) + n/2] + n = 4 T(n/4) + n + n = 4[2T(n/8) + n/4] + 2n = 8T(n/8) + 3n =... eureka! = 2 k T(n/2 k ) + kn let 2 k = n k = log n, this yields 2 log n T(n/2 log n ) + n(log n) n T(1) + n(log n) O(n log n)

CPS Complexity Practice l What is complexity of Build ? (what does it do?) Node * Build(int n) { if (0 == n) return 0; Node * first = new Node(n,Build(n-1)); for(int k = 0; k < n-1; k++) { first = new Node(n,first); } return first; } l Write an expression for T(n) and for T(0), solve.

CPS Recognizing Recurrences l Solve once, re-use in new contexts  T must be explicitly identified  n must be some measure of size of input/parameter T(n) is the time for quicksort to run on an n-element vector T(n) = T(n/2) + O(1) binary search O( ) T(n) = T(n-1) + O(1) sequential search O( ) T(n) = 2T(n/2) + O(1) tree traversal O( ) T(n) = 2T(n/2) + O(n) quicksort O( ) T(n) = T(n-1) + O(n) selection sort O( ) l Remember the algorithm, re-derive complexity n log n n log n n n2n2

CPS Binary Trees l Linked lists have efficient insertion and deletion, but inefficient search  Vector/array: search can be efficient, insertion/deletion not l Binary trees are structures that yield efficient insertion, deletion, and search  trees used in many contexts, not just for searching, e.g., expression trees  search is as efficient as binary search in array, insertion/deletion as efficient as linked list (once node found)  binary trees are inherently recursive, difficult to process trees non-recursively, but possible (recursion never required, but often makes coding/algorithms simpler)

CPS From doubly-linked lists to binary trees l Instead of using prev and next to point to a linear arrangement, use them to divide the universe in half  Similar to binary search, everything less goes left, everything greater goes right  How do we search?  How do we insert? “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ” “pig” “hippo” “leopard” “koala”

CPS Basic tree definitions l Binary tree is a structure:  empty  root node with left and right subtrees l terminology: parent, children, leaf node, internal node, depth, height, path link from node N to M then N is parent of M –M is child of N leaf node has no children –internal node has 1 or 2 children path is sequence of nodes, N 1, N 2, … N k –N i is parent of N i+1 –sometimes edge instead of node depth (level) of node: length of root-to-node path –level of root is 1 (measured in nodes) height of node: length of longest node-to-leaf path –height of tree is height of root A B ED F C G

CPS Printing a search tree in order l When is root printed?  After left subtree, before right subtree. void visit(Node * t) { if (t != 0) { visit(t->left); cout info << endl; visit(t->right); } l Inorder traversal “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ” “pig” “hippo” “leopard”

CPS Insertion and Find? Complexity? l How do we search for a value in a tree, starting at root?  Can do this both iteratively and recursively, contrast to printing which is very difficult to do iteratively  How is insertion similar to search? l What is complexity of print? Of insertion?  Is there a worst case for trees?  Do we use best case? Worst case? Average case? l How do we define worst and average cases  For trees? For vectors? For linked lists? For vectors of linked-lists?

CPS From concept to code with binary trees l Trees can have many shapes: short/bushy, long/stringy  if height is h, number of nodes is between h and 2 h -1  single node tree: height = 1, if height = 3  C++ implementation, similar to doubly-linked list struct Tree { string info; Tree * left; Tree * right; Tree(const string& s, Tree * lptr, Tree * rptr) : info(s), left(lptr), right(rptr) { } };

CPS Tree functions l Compute height of a tree, what is complexity? int height(Tree * root) { if (root == 0) return ; else { return 1 + max(height(root->left), height(root->right) ); } l Modify function to compute number of nodes in a tree, does complexity change?  What about computing number of leaf nodes?

CPS Tree traversals l Different traversals useful in different contexts  Inorder prints search tree in order Visit left-subtree, process root, visit right-subtree  Preorder useful for reading/writing trees Process root, visit left-subtree, visit right-subtree  Postorder useful for destroying trees Visit left-subtree, visit right-subtree, process root “llama” “tiger” “monkey” “jaguar” “elephant” “ giraffe ”

CPS Insertion into search tree l Simple recursive insertion into tree void insert(Tree *& t, const string& s) // pre: t is a search tree // post: s inserted into t, t is a search tree { if (t == 0) t = new Tree(s,0,0); else if (s left) insert(t->left,s); else insert(t->right,s); } l Note: in each recursive call, the parameter t in the called clone is either the left or right pointer of some node in the original tree  Why is this important?  Why must t be passed by reference?  For alternatives see readsettree.cpp

CPS Balanced Trees and Complexity l A tree is height-balanced if  Left and right subtrees are height-balanced  Left and right heights differ by at most one bool isBalanced(Tree * root) { if (root == 0) return true; else { return isBalanced(root->left) && isBalanced(root->right) && abs(height(root->left) – height(root->right)) <= 1; }

CPS What is complexity? l Assume trees are “balanced” in analyzing complexity  Roughly half the nodes in each subtree  Leads to easier analysis l How to develop recurrence relation?  What is T(n)?  What other work is done? l How to solve recurrence relation  Plug, expand, plug, expand, find pattern  A real proof requires induction to verify that pattern is correct

CPS Simple header file (see readsettree.cpp) class TreeSet { public: TreeSet(); bool contains(const string& word) const; void insert(const string& word); private: struct Node { string info; Node * left * right; // need constructor }; Node * insertHelper(Node * root, const string& s); Node * myRoot; };

CPS Helper functions in readsettree.cpp void TreeSet::insert(const string& s) { myRoot = insertHelper(myRoot); } TreeSet::Node * TreeSet::insertHelper(Node * root,const string& s) { // recursive insertion } l Why is helper function necessary? Is it really necessary?  Alternatives for other functions: print/contains, for example  What about const-ness for public/private functions?  What about TreeSet::Node syntax? Why?