MapReduce, Dictionaries, List Comprehensions

Slides:



Advertisements
Similar presentations
Optional Static Typing Guido van Rossum (with Paul Prescod, Greg Stein, and the types-SIG)
Advertisements

Course A201: Introduction to Programming 10/28/2010.
Stacks & Their Applications COP Stacks  A stack is a data structure that stores information arranged like a stack.  We have seen stacks before.
Stacks, Queues, and Deques. 2 A stack is a last in, first out (LIFO) data structure Items are removed from a stack in the reverse order from the way they.
Week 5 while loops; logic; random numbers; tuples Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except.
Stacks. 2 What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
Stacks (Revised and expanded from CIT 591). What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on.
Stacks. 2 What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
Stacks, Queues, and Deques
CMPT 120 Lists and Strings Summer 2012 Instructor: Hassan Khosravi.
Data Structures Akshay Singh.  Lists in python can contain any data type  Declaring a list:  a = [‘random’,’variable’, 1, 2]
Unit 5 while loops; logic; random numbers; tuples Special thanks to Roy McElmurry, John Kurkowski, Scott Shawcroft, Ryan Tucker, Paul Beck for their work.
Data Structures in Python By: Christopher Todd. Lists in Python A list is a group of comma-separated values between square brackets. A list is a group.
If statements while loop for loop
Collecting Things Together - Lists 1. We’ve seen that Python can store things in memory and retrieve, using names. Sometime we want to store a bunch of.
Concurrent Algorithms. Summing the elements of an array
Built-in Data Structures in Python An Introduction.
Iterators, Linked Lists, MapReduce, Dictionaries, and List Comprehensions... OH MY! Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their.
Daniel Jung. Types of Data Structures  Lists Stacks Queues  Tuples  Sets  Dictionaries.
Higher Order Functions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where otherwise noted, this.
Basic Data Structures Stacks. A collection of objects Objects can be inserted into or removed from the collection at one end (top) First-in-last-out.
Python Data Structures By Greg Felber. Lists An ordered group of items Does not need to be the same type – Could put numbers, strings or donkeys in the.
Stacks. What is a Stack? A stack is a type of data structure (a way of organizing and sorting data so that it can be used efficiently). To be specific,
MapReduce, Dictionaries, List Comprehensions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where.
Map, List, Stack, Queue, Set Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where otherwise noted,
String and Lists Dr. José M. Reyes Álamo.
CSc 110, Autumn 2016 Lecture 26: Sets and Dictionaries
CS314 – Section 5 Recitation 10
Higher Order Functions
Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.
CSc 110, Spring 2017 Lecture 29: Sets and Dictionaries
CMSC201 Computer Science I for Majors Lecture 18 – Recursion
Objectives In this lesson, you will learn to: Define stacks
Problems with Linked List (as we’ve seen so far…)
Lecture 10 Data Collections
Concurrent Algorithms
CSc 110, Spring 2017 Lecture 28: Sets and Dictionaries
CSc 110, Spring 2018 Lecture 32: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 31: Dictionaries
CSc 110, Autumn 2017 Lecture 30: Sets and Dictionaries
i206: Lecture 11: Stacks, Queues
i206: Lecture 10: Lists, Stacks, Queues
CS313D: Advanced Programming Language
Stacks, Queues, and Deques
Week 4 gur zntvp jbeqf ner fdhrnzvfu bffvsentr
Stacks.
CSc 110, Spring 2018 Lecture 33: Dictionaries
Stacks, Queues, and Deques
Concurrent Algorithms
Java Collections Framework
Chapter 10 Lists.
while loops; logic; random numbers; tuples
Built-In Functions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where otherwise noted, this work.
String and Lists Dr. José M. Reyes Álamo.
Introduction to Dictionaries
Mutable Data (define mylist (list 1 2 3)) (bind ((new (list 4)))
Stacks Abstract Data Types (ADTs) Stacks
Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.
Stacks.
Lambda Functions, MapReduce and List Comprehensions
Some Assembly (Part 2) set.html.
Stacks, Queues, and Deques
CSE 231 Lab 6.
Concurrent Algorithms
List Comprehensions Problem: given a list of prices, generate a new list that has a 20% discount to each. Formally: input: list of old prices; output:
Concurrent Algorithms
Concurrent Algorithms
CS 5010 Program Design Paradigms “Bootcamp” Lesson 4.1
Introduction to Computer Science
slides created by Marty Stepp and Hélène Martin
Presentation transcript:

MapReduce, Dictionaries, List Comprehensions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where otherwise noted, this work is licensed under: http://creativecommons.org/licenses/by-nc-sa/3.0

Why didn't NaN == NaN? Last time we found that float(“nan”) == float(“nan”) is False NaN is “nothing”, does not equal anything, even itself. To find out if a value is NaN you have to use isnan >>> from math import * >>> isnan(n) True >>> n = float("nan")

MapReduce Framework for processing huge datasets on certain kinds of distributable problems Map Step: - master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. - worker node may chop its work into yet small pieces and redistribute again

MapReduce Framework for processing huge datasets on certain kinds of distributable problems Map Step: - master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. - worker node may chop its work into yet small pieces and redistribute again

MapReduce Reduce Step: - master node then takes the answers to all the sub-problems and combines them in a way to get the output

MapReduce Problem: Given an email how do you tell if it is spam? - Count occurrences of certain words. If they occur too frequently the email is spam.

MapReduce 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 email = ['the', 'this', 'annoy', 'the', 'the', 'annoy'] >>> def inEmail (x): if (x == "the"): return 1; else: return 0; >>> map (inEmail, l) [1, 0, 0, 0, 1, 1, 0] >>> reduce ((lambda x, xs: x + xs), map(inEmail, email))

Stack Information found here: Really easy! http://docs.python.org/py3k/library/stdtypes. html Really easy! Use lists to simulate a stack .append(value) in order to “push” onto the stack .pop() in order to ‘pop’ off the stack. Check this example out! >>> a = [] >>> a.append(1) >>> a.append(2) >>> a.append(3) >>> a.append(5) >>> a [1, 2, 3, 5] >>> a.pop() 5 [1, 2, 3] 3 [1, 2] 8

Queue Really easy too!!!! Use lists to simulate a queue .insert(0, value) in order to ‘add’ to the queue. .pop() in order to ‘remove’ off from the queue. Why does this work? What do insert and pop really do? >>> a = [] >>> a.insert(0,2) >>> a.insert(0,3) >>> a.insert(0,4) >>> a [4, 3, 2] >>> a.pop() 2 [4, 3] >>> a.insert(0, a.pop()) 9

Queue Ok… this is dumb… and inefficient. Using lists, we get O(N), but….. from collections import deque queue = deque([‘blah’, ‘blah’, ‘blah’]) Use append() to ‘add’ to the queue Use popleft() to ‘remove’ from the queue This works in O(1)! Sweeeeeeeeet. >>> from collections import deque >>> my_list = ["Jordan", "Zeratul", "Kerrigan", "Raynor", "Artosis"] >>> my_list ['Jordan', 'Zeratul', 'Kerrigan', 'Raynor', 'Artosis'] >>> q = deque(my_list) >>> q deque(['Jordan', 'Zeratul', 'Kerrigan', 'Raynor', 'Artosis']) >>> q.append("Tasteless") deque(['Jordan', 'Zeratul', 'Kerrigan', 'Raynor', 'Artosis', 'Tasteless']) >>> q.append("IMMVP" ) deque(['Jordan', 'Zeratul', 'Kerrigan', 'Raynor', 'Artosis', 'Tasteless', 'IMMVP']) >>> q.popleft() 'Jordan' 'Zeratul' 10

Dictionary Equivalent to Java’s Map. Stores keys and values. Empty dictionary is defined by: m = {} Pre-populated dictionary: (Map<String, String>) (mapping names to SCII division…) m2 = { ‘jordan’ : ‘master’, ‘roy’ : ‘bronze’, ‘marty’ : ‘bronze’ } Add to dictionary: m2[ ‘IMMVP’ ] = ‘master’ Retrieve from dictionary m2[‘jordan’] # ‘master’ 11

Dictionary How to get keySet() map.keys() Uh oh……. We get something called dict_keys… We should cast to a list. list(map.keys()) How to check if a key is contained in a dictionary… ‘marty’ in m # returns true bc ‘marty’ is in dictionary ‘bronze’ in m # returns false bc ‘bronze’ is not a key How to delete del m[‘marty’] Get values with: m.values() Note that you get the dict_values. So cast to a list. >>> m {'hi': 3, 'hello': 32} >>> m = {'jordan' : 'master', 'roy' : 'bronze', 'marty' : 'bronze' } {'marty': 'bronze', 'roy': 'bronze', 'jordan': 'master'} >>> m.keys() dict_keys(['marty', 'roy', 'jordan']) >>> a = list(m.keys()) >>> a ['marty', 'roy', 'jordan'] >>> 'marty' in m True >>> 'roy' in m >>> 'oy' in m False 12

Set s = { ‘GLaDOS’, ‘Cloud’, ‘Aeris’, ‘Shepard’, ‘Cloud’, ‘Aeris’ } Removed duplicates s2 = set(‘hello I am soooo happy!!!’) Creates a set out of all the characters The stuff in the blue must be iterable. i.e. lists, tuples, etc… Check for contains is same as dictionary. ‘GLaDOS’ in s >>> s = {'a', 'b', 'c', 'a', 'b', 'd'} >>> s {'a', 'c', 'b', 'd'} >>> 'b' in s True >>> a = set('hello world i am soooooo happy') >>> a {'a', ' ', 'e', 'd', 'i', 'h', 'm', 'l', 'o', 'p', 's', 'r', 'w', 'y'} >>> 13

Set Tons of extra functionality Check for subsets, see if set A is contained in set B Go here to see more: (under 5.7) http://docs.python.org/py3k/library/stdtypes.html#set.issubset 14

List Comprehensions Applies the expression to each element in the list [expression for element in list] Applies the expression to each element in the list You can have 0 or more for or if statements If the expression evaluates to a tuple it must be in parenthesis 15

List Comprehensions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 >>> vec = [2, 4, 6] >>> [3*x for x in vec] [6, 12, 18] >>> [3*x for x in vec if x > 3] [12, 18] >>> [3*x for x in vec if x < 2] [] >>> [[x,x**2] for x in range(10)] [[0, 0], [1, 1], [2, 4], [3, 9]] >>> [x, x**2 for x in vec] # error - parens required for tuples 16

List Comprehensions You can do most things that you can do with map, filter and reduce more nicely with list comprehensions How can we find out how many times ‘a’ appears in the list named email? 1 2 3 4 5 6 >>> email = ['once', 'upon', 'a', 'time', 'in', 'a', 'far', 'away'] >>> len( [1 for x in email if x == 'a'] ) >>> 2 Let’s explain this plz…. 17