Compsci 201 Priority Queues & Autocomplete

Slides:



Advertisements
Similar presentations
COL 106 Shweta Agrawal and Amit Kumar
Advertisements

CMSC 341 Binary Heaps Priority Queues. 8/3/2007 UMBC CSMC 341 PQueue 2 Priority Queues Priority: some property of an object that allows it to be prioritized.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
Source: Muangsin / Weiss1 Priority Queue (Heap) A kind of queue Dequeue gets element with the highest priority Priority is based on a comparable value.
CSE 373 Data Structures and Algorithms Lecture 13: Priority Queues (Heaps)
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Geoff Holmes and Bernhard Pfahringer COMP206-08S General Programming 2.
CPS 100, Fall YAQ, YAQ, haha! (Yet Another Queue) l What is the dequeue policy for a Queue?  Why do we implement Queue with LinkedList Interface.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.
Priority Queue. Priority Queues Queue (FIFO). Priority queue. Deletion from a priority queue is determined by the element priority. Two kinds of priority.
CompSci 100E 24.1 Data Compression  Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant factor.
1 Joe Meehean.  We wanted a data structure that gave us... the smallest item then the next smallest then the next and so on…  This ADT is called a priority.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant.
PRIORITY QUEUES AND HEAPS Slides of Ken Birman, Cornell University.
Intro. to Data Structures Chapter 6 Priority Queue (Heap) Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Priority Queue.
CSE 373: Data Structures and Algorithms Lecture 11: Priority Queues (Heaps) 1.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
CompSci 100e 8.1 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted Vector/array Sorted vector/array.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  Why is compression important?
Priority Queues CS 110: Data Structures and Algorithms First Semester,
Priority Queues CS /02/05 L7: PQs Slide 2 Copyright 2005, by the authors of these slides, and Ateneo de Manila University. All rights reserved.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Heaps and Priority Queues What is a heap? A heap is a binary tree storing keys at its internal nodes and satisfying the following properties:
Sorting With Priority Queue In-place Extra O(N) space
Priority Queues A priority queue is an ADT where:
CSE373: Data Structures & Algorithms Priority Queues
CSCE 3110 Data Structures & Algorithm Analysis
CSCE 3110 Data Structures & Algorithm Analysis
CS 201 Data Structures and Algorithms
Heaps (8.3) CSE 2011 Winter May 2018.
CSCE 210 Data Structures and Algorithms
Heaps And Priority Queues
Priority Queues and Heaps
Heapsort CSE 373 Data Structures.
Data Compression Compression is a high-profile application
Heaps 8/2/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser,
October 30th – Priority QUeues
Hashing Exercises.
Source: Muangsin / Weiss
Bohyung Han CSE, POSTECH
CMSC 341 Lecture 13 Leftist Heaps
Heaps, Priority Queues, Compression
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
7/23/2009 Many thanks to David Sun for some of the included slides!
Chapter 8 – Binary Search Tree
Priority Queue & Heap CSCI 3110 Nan Chen.
Part-D1 Priority Queues
i206: Lecture 14: Heaps, Graphs intro.
CSE 373: Data Structures and Algorithms
CMSC 341 Lecture 14 Priority Queues & Heaps
Computer Science 2 Heaps.
Priority Queues (Chapter 6.6):
Heapsort CSE 373 Data Structures.
Heaps © 2014 Goodrich, Tamassia, Goldwasser Heaps Heaps
Priority Queues & Heaps
1 Lecture 10 CS2013.
CSC 380: Design and Analysis of Algorithms
Priority Queues CSE 373 Data Structures.
Priority Queues & Heaps
Priority Queues (Chapter 6):
Compsci 201 Priority Queues + Heaps Autocomplete
CSE 373: Data Structures and Algorithms
Data Structures for Shaping and Scheduling
Scoreboard What else might we want to do with a data structure?
CS210- Lecture 13 June 28, 2005 Agenda Heaps Complete Binary Tree
Presentation transcript:

Compsci 201 Priority Queues & Autocomplete Owen Astrachan Jeff Forbes November 10, 2017 11/10/17 Compsci 201, Fall 2017, Searching

T is for … TeraFLOPS Xbox One X @ 6 x1012 operations per second Turing award Award for highest distinction in CS – 2016 Tree More versions to come Trie O(#characters) search 11/10/17 CompSci 201, Fall 2017, Search

Plan for the Day Introduce data structures and algorithms for search Heaps for Priority Queues Towards Autocomplete 11/10/17 CompSci 201, Fall 2017, Search

What’s in Common? 11/10/17 CompSci 201, Fall 2017, Search

Priority! Applications of Priority Queues Shortest Path: Google Maps to Internet Routing Event based simulation: Predicting collisions Best-first search, game-playing, AI Java code below sorts list. How? Why?

How are PriorityQueues implemented? Heap is an array-based implementation of a binary tree used for implementing priority queues, supports: insert, findMin, deleteMin: complexities? Using array minimizes storage (no explicit pointers), faster too --- children are located by index/position in array Heap is a binary tree with shape property, heap/value property shape: tree filled at all levels (except perhaps last) and filled left-to-right (complete binary tree) value: each node has value smaller than both children

Using an array for a Heap Store “node values” in array beginning at index 1 Could also store starting at index 0 For node with index k left child: index 2*k right child: index 2*k+1 parent: index k/2 Why is this structure conducive for maintaining heap shape? What about value property? Is the heap a search tree? Where is minimal node? 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19 6 10 7 17 13 9 21 19 25

Heap questions Where is minimal element? Root, why? Where is maximal element? http://bit.ly/201f17-heaps-0 What is complexity of find max in a min-heap? Why? Where is second smallest element? Why? 6 10 7 17 13 9 21 19 25 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19 http://bit.ly/201f17-heaps-1

Adding values to heap To maintain heap shape, must add new value in left-to-right order of last level could violate heap property move value “up” if too small Change places with parent if heap property violated stop when parent is smaller stop when root is reached Pull parent down, swapping isn’t necessary (optimization) 13 6 10 7 17 9 21 19 25 insert 8 13 6 10 7 17 9 21 19 25 8 6 10 7 17 9 21 19 25 13 8 bubble 8 up 6 7 17 9 21 19 25 8 13 10 http://bit.ly/201f17-heaps-2

Heap add implementation 13 6 10 7 17 9 21 19 25 13 6 10 7 17 9 21 19 25 8 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19 ArrayList<Integer> list

Removing extremal element Where is minimal element? If we remove it, what changes, shape/property? How can we maintain shape? “last” element moves to root What property is violated? After moving last element, subtrees of root are heaps, why? Move root down (pull child up) does it matter where? When can we stop “re-heaping”? Less than both children Reach a leaf 13 6 10 7 17 9 21 19 25 13 25 10 7 17 9 21 19 13 7 10 25 17 9 21 19 13 7 10 9 17 25 21 19

Priority Queue implementations Implementing priority queues: average and worst case Insert average Getmin (peek) worst (delete) Unsorted ArrayList O(1) O(n) Sorted ArrayList O(1)/ O(n) Heap O(log n) Balanced binary search tree O(log n)/ O(1) Heap has O(1) find-min (no delete) and O(n) build heap

Problem 1 Big Oh if PriorityQueue implemented with Heap? Keep track of top M of N elements (N >> M). Big Oh if PriorityQueue implemented with Heap?

Problem 2 Determine the number of elements < target in heap lessCount(heap,13) → 3 lessCount(heap,8) → 1 O(k) where k is #of elements less than target What is you had helper method that returns # elements in subheap rooted/starting at index that are less than target? int lessCount(int[] heap, int index, int target) What’s the initial call to helper method? http://bit.ly/201f17-heaps-3 11/10/17 CompSci 201, Fall 2017, Search

What is autocomplete? As user types in search box Give potential completions. How? Efficiency is key 50 ms or go home Data (Terms) Possible words/phrases Weights

Searching public interface Autocompletor { // Returns the top k matching terms in descending order of weight. public Iterable<String> topMatches(String prefix, int k); // Returns the single top matching term public String topMatch(String prefix); // Return the weight of a given term public double weightOf(String term); }

The Term class The Term class encapsulates a Comparable word-weight pair. Includes completed compareTo method, which sorts lexicographically. You are responsible for implementing three Term Comparators: WeightOrder, which sorts in ascending weight order ReverseWeightOrder, which sorts in descending weight order PrefixOrder, which sorts by the first r characters

Comparable and Comparator Both are interfaces, there is no default implementation Contrast with .equals(), default implementation? Contrast with .toString(), default? Where do we define a Comparator? In its own .java file, nothing wrong with that Private, used for implementation and not public behavior Use a nested class, then decide on static or non-static Non-static is part of an object, access inner fields In Term class: once written initialize with new Term.ReverseWeightOrder(); or new Term.PrefixOrder(2);

Comparator’s compare x ≤ y takes two Objects (of the same class) as arguments Returns a negative value if the 1st is less than 2nd zero if the arguments are equal, and a positive value if the 1st is greater than the 2nd compare will be called by methods in Arrays.sort or PriorityQueue? Why? x ≤ y is equivalent to compare(x,y) <= 0

PrefixOrder The goal of PrefixOrder is to sort lexicographically, but only considering the first r characters. e.g. normally we would put “energy” before “entropy” lexicographically. However, PrefixOrder with r = 2 considers them equal (PrefixOrder with r = 3 would still put “energy” before “entropy”, however). If one or both of the words is shorter than r characters, we just use normal lexicographic sorting. For full credit, PrefixOrder’s compare method should take O(r).

BruteAutocomplete Naïve approach to autocomplete Store data as a Term array. Find the top k matches: iterates through the array, pushes all terms starting with the prefix onto a max-priority queue sorted by weight. Return top k terms off that priority queue Find top match is similar

Why is BruteAutocomplete bad? If we have n terms, m of which start with the prefix, then topKmatches is O(n + m log m) and topMatch is O(n). Why is this bad? Imagine Google! So, we wish to improve upon BruteAutocomplete, by only considering those terms that start with prefix

Improving BruteAutocomplete BruteAutocomplete had to iterate through every single term in the array because it did not have any prior knowledge as to where terms starting with the prefix could be located – i.e. the array was unsorted. If we sort the array lexicographically, then all terms which start with the prefix will be adjacent. Sorting takes O(n log n), but we only have to do it once every call to topMatch or topKMatches, regardless of inputs, can use the same sorted Term array. Need to locate terms starting with the prefix quickly. Use binary search.

Binary Search Search for 5? Where to go? How to reduce problem size?

Binary Search How did problem change?

Binary Search

BinarySearchAutocomplete For Autocomplete: find the range of all terms comparator considers equal to key e.g., all terms with a that match prefix auto BinarySearchAutocomplete is the 2nd class you should implement. BinarySearchAutocomplete implements Autocompletor plus: public static int firstIndexOf(Term[] a, Term key, Comparator<Term> comp) public static int lastIndexOf(Term[] a, Term key, Comparator<Term> comp) Use binary search to quickly return the first and last index respectively of an element in the input array which the comparator considers equal to key. We specify first and last index because there could be multiple Terms in a which the comparator considers equal to key.

Binary search code

Modifications for BinarySearchAutocomplete