Download presentation
Presentation is loading. Please wait.
1
CompSci 101 Introduction to Computer Science
Nov. 22 , 2016 Prof. Rodger compsci101 fall2016
2
Announcements Reading and RQ due next time
Assignment 7 due next Tuesday Assignment 8 and 9 out soon APT 9 out and due in a week and a half Today: Solving problems How do change how things are sorted? Other than ordering and re-ordering tuple How do Python .sort and sorted() stack up? compsci101 fall2016
3
Clever Hangman - Dictionary
Builds a dictionary of categories Start with list of words of correct size Repeat User picks a letter Make dictionary of categories based on letter New list of words is largest category Category includes already matched letters List shrinks in size each time cps101 fall 2016
4
Clever Hangman Example
Possible scenerio after several rounds From list of words with a the second letter. From that build a dictionary of list of words with no d and with d in different places: Great Dictionary example to solve a problem! Choose “no d”, most words, 147 Only 17 words of this type Only 1 word of this type
5
Clever Hangman How to start? How to modify assignment 5?
compsci101 fall2016
6
Playing go-fish, spades, or …
Finding right card? What helps? Issues here? Describe algorithm: First do this Then do this Substeps ok When are you done? The idea here is that when you have a small scale problem, algorithms don't matter. You don't need to teach kids how to sort, they kind of know. If you're playing gin rummy, do this. Go fish, do this. Hearts? Do this. For sorting at scale maybe we need smarter algorithms? Didn’t explain sorting to your kids, they just figure it out. compsci101 fall2016
7
Problem Solving with Algorithms
Top 100 songs of all time, top 2 artists? Most songs in top 100 Wrong answers heavily penalized You did this in lab, you could do this with a spreadsheet What about top 1,000 songs, top 10 artists? How is this problem the same? How is this problem different Reminder we addressed this in a previous lab with spreadsheets and programs. We'll look at a python program to solve this problem on next slide. Does top 10 artists make a difference over top 2 artists? Doing by hand v automating? When does writing code make sense? It depends? compsci101 fall2016
8
Scale As the size of the problem grows … Search
The algorithm continues to work A new algorithm is needed New engineering for old algorithm Search Making Google search results work Making SoundHound search results work Making Content ID work on YouTube Again, with scale you need smarter algorithms, searching requires order of some sort to be efficient, sorting is one way to order data to facilitate search SoundHound = how to search fast – how do they Content ID = YouTube tags things automatically, how do they do that? Scale problem compsci101 fall2016
9
Python to the rescue? Top1000.py
import csv, operator f = open('top1000.csv','rbU') data = {} for d in csv.reader(f,delimiter=',',quotechar='"'): artist = d[2] song = d[1] if not artist in data: data[artist] = 0 data[artist] += 1 itemlist = data.items() dds = sorted(itemlist,key=operator.itemgetter(1),reverse=True) print dds[:30] We want the top 30 artists, sorted by how many hits, with most hits first in list. We'll unpack this code step-by-step. It's snarfable too These are top 1000 songs and their artist compsci101 fall2016
10
Understanding sorting API
How API works for sorted() or .sort() Alternative to changing order in tuples and then changing back x = sorted([(t[1],t[0]) for t in dict.items()]) x = [(t[1],t[0]) for t in x] x = sorted(dict.items(),key=operator.itemgetter(1)) Sorted argument is key to be sorted on, specify which element of tuple. Must import library operator for this Illustrates that operator.itemgetter() is a convenience. Can always sort tuples, then re-arrange to put tuples back in original order, but with itemgetter we can specify which element of tuple to sort on compsci101 fall2016
11
Sorting from an API/Client perspective
API is Application Programming Interface, what is this for sorted(..) and .sort() in Python? Sorting algorithm is efficient, stable: part of API? sorted returns a list, doesn't change argument sorted(list,reverse=True), part of API foo.sort() modifies foo, same algorithm, API How can you change how sorting works? Change order in tuples being sorted, [(t[1],t[0]) for t in …] Alternatively: key=operator.itemgetter(1) This is simply a summary of the API, so code and ideas are in one place. Reminder that reverse changes order from L2H to H2L, see example about songs two slides ago compsci101 fall2016
12
Beyond the API, how do you sort?
Beyond the API, how do you sort in practice? Leveraging the stable part of API specification? If you want to sort by number first, largest first, breaking ties alphabetically, how can you do that? Idiom: Sort by two criteria: use a two-pass sort, first is secondary criteria (e.g., break ties) [("ant",5),("bat", 4),("cat",5),("dog",4)] [("ant",5),("cat", 5),("bat",4),("dog",4)] When you want to break ties alphabetically, first sort alphabetically, then by number of occurrences (for example). The data here is shown first sorted alphabetically: ant bat cat dog Then by # occurrences or the number. Notice that ant comes before cat and bat comes before dog This is because of STABLE, to be discussed later in class. compsci101 fall2016
13
Two-pass (or more) sorting
Because sort is stable sort first on tie-breaker, then that order is fixed since stable a0 = sorted(data,key=operator.itemgetter(0)) a1 = sorted(a0,key=operator.itemgetter(2)) a2 = sorted(a1,key=operator.itemgetter(1)) data [('f', 2, 0), ('c', 2, 5), ('b', 3, 0), ('e', 1, 4), ('a', 2, 0), ('d', 2, 4)] a0 [('a', 2, 0), ('b', 3, 0), ('c', 2, 5), ('d', 2, 4), ('e', 1, 4), ('f', 2, 0)] Notice that data is jumbled, and that a0 is sorted alphabetically: a b c d e f Why? Itemgetter(0) compsci101 fall2016
14
Two-pass (or more) sorting
a0 = sorted(data,key=operator.itemgetter(0)) a1 = sorted(a0,key=operator.itemgetter(2)) a2 = sorted(a1,key=operator.itemgetter(1)) a0 [('a', 2, 0), ('b', 3, 0), ('c', 2, 5), ('d', 2, 4), ('e', 1, 4), ('f', 2, 0)] a1 [('a', 2, 0), ('b', 3, 0), ('f', 2, 0), ('d', 2, 4), ('e', 1, 4), ('c', 2, 5)] a2 [('e', 1, 4), ('a', 2, 0), ('f', 2, 0), ('d', 2, 4), ('c', 2, 5), ('b', 3, 0)] A0 is sorted by first tuple element, the string: order is a b c d e f A1 is that list, sorted by last element in each triple/tuple: order on last is 0,0,0,4,4,5 A2 is final list by first number, index[1]: e a f d c b Why in a2 is order of tuples with 2 as second entry: a f d c, notice a2[3] here: 4, 0, 0, 4, 5, 0 compsci101 fall2016
15
Answer Questions bit.ly/101f16-1122-1
SortByFreqs APT Sort items by their frequency, then sorted in frequencies. compsci101 fall2016
16
Answer Questions bit.ly/101f16-1122-2
MedalTable APT Sort items by their frequency, then sorted in frequencies. compsci101 fall2016
17
Timingsorts.py, what sort to call?
Simple to understand, hard to do fast and at-scale Scaling is what makes computer science … Efficient algorithms don't matter on lists of 100 or 1000 Named algorithms in 201 and other courses bubble sort, selection sort, merge, quick, … See next slide and TimingSorts.py Basics of algorithm analysis: theory and practice We can look at empirical results, would also like to be able to look at code and analyze mathemetically! How does algorithm scale? compsci101 fall2016
18
New sorting algorithms happen …
timsort is standard on… Python as of version 2.3, Android, Java 7 According to Adaptive, stable, natural mergesort with supernatural performance What is mergesort? Fast and Stable What does this mean? Which is most important? Nothing is faster, what does that mean? Quicksort is faster, what does that mean? N log N is the best we can do for comparison based sorter, so nothing is faster than timsort theoretically. In practice Quicksort is often faster on some data sets, but timsort is stable and timsort is really fast on nearly sorted data. Quicksort doesn't have those properties. compsci101 fall2016
19
TimingSorts.py size create bubble select timsort 1000 0.026 0.127
0.081 0.002 2000 0.045 0.537 0.273 0.001 3000 0.058 1.126 0.646 4000 0.082 2.174 1.208 0.003 5000 0.101 3.521 1.862 6000 0.118 4.617 3.005 0.004 7000 0.168 7.504 4.237 0.005 8000 0.156 9.074 6.152 0.007 9000 0.184 11.611 8.089 10000 0.212 14.502 9.384 0.008 Run TimingSorts.py and notice how slow bubble and select are and how fast system/tim sort is You might change code to go from 10,000 to 100,000 by 10,000 and only call timsort --- just change the list of sorts that are called and see what happens compsci101 fall2016
20
Stable, Stability What does the search query 'stable sort' show us?
Image search explained First shape, then color: for equal colors? Explain stable. First line is jumbled shapes. We first sort based on shape: triangles squares, circles: could be by number of sides, for example Then we sort by color with order yellow < green < red Notice that in each color group the shapes have the same order. Why? Stable sort! compsci101 fall2016
21
Stable sorting: respect re-order
Women before men … First sort by height, then sort by gender Top row sorted by height Then second row by gender, notice that women are in order and all women precede men: assuming women < men compsci101 fall2016
22
How to import: in general and sorting
We can write: import operator Then use key=operator.itemgetter(…) We can write: from operator import itemgetter Then use key=itemgetter(…) From math import pow, From cannon import pow Oops, better not to do that, use dot-qualified names like math.sqrt and operator.itemgetter Import cannon is a joke
23
TimingSorts.py Questions bit.ly/101f16-1122-3
After going over questions, could do sorting APTs ? Should we give those now? Sortbyfreqs Medltable EatDrink? compsci101 fall2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.