Week 12 - Thursday CS 113
Last time What did we talk about last time? Big data Sample problems Challenges Activity!
Questions?
Project 4
Final Project
Big Data Paradigms
Paradigm: MapReduce Leverage parallelization Divide analysis into two parts Map task: Given a subset of the data, extract relevant data and obtain partial results Reduce task: Receive partial results from each map task and combine into a final result Your One Fish Two Fish counting probably followed the idea of MapReduce
Paradigm: MapReduce Used for Internet search Map task: given a part of the index, identify pages containing keywords and calculate relevance Reduce task: rank pages based on relevance Infrastructure requirements Many machines to run map tasks in parallel Ability to retrieve and store data Coordination of who does what MapReduce is at the heart of many of Google's services
Paradigm: Cloud Computing Large collections of processing and storage resources used on demand Sell resources such as processors and gigabytes of storage for some period of time Benefits for users Only pay for what you use 100 servers at $1/hour for 1 hour = $100 1 server at $1/hour for 100 hours = $100 Externally managed Benefits for cloud providers Economies of scale for space and equipment
Paradigm: Cloud Computing Several different models Infrastructure as a service Virtual machines can be accessed by the user The user has complete control of OS and applications Platform as a service An OS, programming language execution environment, database and web server is provided The user can write any code for it Software as a service Software is made available to the user The user can use it without installing applications or storing data locally
Paradigm: Data mining Identify patterns and relationships in data Used to rank, categorize, and so on Commonly associated with artificial intelligence and machine learning We talked about data mining before, but it's important to mention it in the context of Big Data
Paradigm: Visualization Visualization might be one of the harder aspects of Big Data to deal with There's no "one size fits all" approach Conventional: line, bar, pie charts Alternative: bubble chart, tree map Text: tag cloud, word tree
Objects in Python
Objects The idea of an object is to group together data and code You have used objects a good bit already Strings are objects The various shapes in VPython are objects Even lists are a special kind of object
Calling methods When you have an object, you can call methods on it A method is like a function, except that it has access to the details of the object To call a method, you type the name of the object, a dot, and the name of the method A method will always have parentheses after it Sometimes the parentheses will have arguments that the method uses
Method call examples You have called a few methods with strings: You've called at least one method with a list: phrase = "BOOM goes the dynamite!" other1 = phrase.lower() #makes all lowercase other2 = phrase.upper() #makes all uppercase words = phrase.split(" ") #makes a list of words words.sort() #sorts the list
Other useful string methods Example isalpha() Returns True if the string contains all letters word = "blechh" if word.isalpha(): #true isdigit() Returns True if the string contains only digits text = "8675309" if text.isdigit(): #true find(text) Return the index where text is, or -1 if it's not in word = "dysfunctional" i = word.find("fun") #3 replace(old,new) Replace all occurrences of old with new word = "banana" s = word.replace("an","od") print(s) #bododa startswith(prefix) Returns True if the string starts with prefix s = "aardvark" if s.startswith("aa"):#true endswith(suffix) Returns True if the string ends with suffix s = "shell" if s.endswith("hell"):#true
Strings are immutable Remember that strings are immutable This means that there is no method you can call that changes a string If you want to change a string, you have to assign it to something else with = name = "Raymond Luxury Yacht" name.lower() #doesn't change anything! name = name.lower() #raymond luxury yacht
Members Members are the data inside of an object Although the shapes from VPython don't have any interesting methods, they do have members Like methods, you access a member with the name of the object, a dot, and then the name of the member Unlike methods, members never have parentheses They are values, not functions that do things
Member examples As you know, the VPython objects have many useful members like pos, color, and some have a radius The pos member is itself an object with x, y, and z members ball = sphere() #creates a default sphere ball.radius = 3.7 ball.color = color.green ball.pos.x = 2 ball.pos.y = 5 ball.pos.z = -3
Adding members Python allows us to add members anytime we want Doing so lets us keep extra information in each object For example, we could give each VPython object we create a name ball = sphere() ball.radius = 3.7 ball.color = color.green ball.name = "Susan"
Member adding confusion Unfortunately, the feature that allows us to add new members to objects means that unexpected things happen if we misspell an existing member Instead of setting the radius, a new member called raduis has been created The value of radius stays the default of 1.0 ball = sphere() ball.raduis = 3.7 #should be radius
Creating entirely new classes Adding members to existing objects is cool But what if you want to create an entire object from scratch? A class is a template for an object You can define a class that will allow you to create your own custom objects But we'll talk about it next time…
Lab 12
Upcoming
Next time… More on classes and objects Blown to Bits, Chapter 2 Changes in travel and communication
Reminders Finish Project 4 No class on Monday! Due tonight by midnight! No class on Monday!