Download presentation
Presentation is loading. Please wait.
1
Week 12 - Monday CS 113
2
Last time What did we talk about last time? More on the Internet
IPv4 and IPv6 Lab 11
3
Questions?
4
Project 4
5
Final Project
6
Exam 2 Post Mortem
7
Common Python Mistakes
8
Okay vs. Huh? When I look at your code, I usually see mistakes faster than you do That's not because I can instantly understand what your code does Rather, I know that some things don't make sense I'm going to label things that fundamentally don't make sense as huh? I'll describe the opposite of huh? as okay Something that is okay could still be wrong, but something that is huh? doesn't make sense Please don't feel bad if you're doing something that is huh?
9
Not initializing variables
Before you get the value out of a variable, it has to have something in it Otherwise, you're asking a thing that doesn't exist yet what its value is x = 3 y = x #okay y = z #huh? a += 3 #huh?
10
Which way does = point? Remember that the = operator is like a left-pointing arrow Whatever is on the right gets stored into the variable on the left It matters which is right and which is left x = #okay 4 = x #huh? x = z #huh? z = x #okay things = [3, 4, 5] length = len(things) #okay len(things) = length #huh?
11
else has no condition The purpose of else is to cover the opposite of the condition of the previous if So, it's doesn't make sense for an else to have a condition if x > 4: print("x is big!") else: #okay print("x is small!") else x <= 4: #huh?
12
Most functions don't change values
Most functions (and methods) don't change any values You have to store whatever they do into a variable (using =) for it to be useful name = "Aragorn" length = len(name) #okay len(name) #huh? x = sqrt(5) #okay sqrt(5) #huh? other = name.lower() #okay name.lower() #huh?
13
But some functions do change things
Unfortunately, there are a few functions that do change things You have to remember them We care about sort() and anything that creates a shape You should never store the return value of sort() into anything You only need to store the return value of a shape if you're going to use it again
14
Functions that change the world
Whether or not you store the return value, creating a shape makes a shape on the screen However, never store the return value of sort(), since it gives back nothing box(pos=(1,2,0)) #okay shape = box(pos=(1,2,0)) #okay values = [ 3, 9, 2, 1, 8, 7] values.sort() #okay values = values.sort() #huh?
15
List vs. number A list is a collection of objects
A number is a single object Most of the things you want to do with a list require you to index into the list (deal with a specific item in it) Uses square brackets stuff = [ 5, 8, -4, 10, 4] x = stuff[3] #okay y = stuff(3) #huh? z = stuff + 2 #huh? a = 4 b = a[2] #huh?
16
List vs. string A list is a collection of objects
A string is a collection of characters The characters can't be changed You can call split() on a string but not on a list list = [ 5, 8, -4, 10, 4] phrase = "Hello party people" item = list[4] #okay list[3] = 7 #okay letter = phrase[2] #okay phrase[1] = 'a' #huh? words = phrase.split(" ") #okay Stuff = list.split(" ") #huh?
17
Search Engines Blown to Bits Chapter 4
18
Hierarchy vs. search The Internet was originally designed for hierarchical retrieval Businesses, government institutions, and academics used it to begin with Like a library, you had to know where to look for particular information But the Internet is huge, unstructured, and constantly changing The search engine mentality of "search don't sort" became the only reasonable way to find information
19
How any search engine works
Gather information Keep copies Build an index Understand the query Determine the relevance of each result Rank the relevant results Present the results
20
Gather information First, a search engine has to get information about all the data it is going to index Most search engines use spiders (also called web crawlers) These programs constantly visit pages on the Internet Some domain-specific search engines only visit certain kinds of pages (like law or medicine) Even Google can't visit everything, and it can only visit pages so often Sites with logins usually cannot be visited A file called robots.txt can ask spiders not to visit a site
21
Keep copies Getting the information is the first step
Google actually stores a copy of a huge amount of information on the Internet Called caching Mostly text But they provide image-based search tools too Search engines do this so they can do analysis on the pages (but also because they can only visit them so often) Caching allows users to see "deleted" web pages The Way Back Machine is devoted to such viewing
22
Build an index For searches to be fast, a search engine has to organize the data At Google, there are huge tables of keywords and of websites How many websites exist in the world? 644 million active websites in 2012 according to Business Insider 50 billion pages are indexed by Google They're essentially organized in alphabetical order so that Google can jump to the right part of its index to find what is needed
23
Understand the query The prior steps must happen before you can make a query Once you make a query, the servers at Google have to figure out what you're asking Most queries work statistically and don't depend on the rules for English sentences There are special symbols that can be used for Google searches Quotes for an exact phrase: "eggplant stew" A tilde will give you synonyms: ~hot A minus sign will exclude results: "mail order" -bride The site: specifier searchers only a particular place: wombats site:etown.edu
24
Determine the relevance
A good search engine needs to figure out how relevant each page is to your query If the page contains many repetitions of your search terms, it is probably more relevant There are sophisticated methods that involve context and semantics A query for philadelphia eagles is associated with football in Google But what if you're searching for species of eagles that live around Philadelphia?
25
Rank the results Google can find a long list of relevant pages, but which one goes first? The secret to Google's initial success was its PageRank algorithm Their rankings were significantly better than other rankings at the time It has evolved over time, but the heart of the PageRank algorithm is looking to see how many other pages link to a page Other metrics such as the relevance of pages linked to, quality of spelling, frequency of updates, and many more are useful
26
Present the results This step is relatively boring compared to the others Most search engines present the data in a list They have been adding various bells and whistles Previews when you hover over results Image-based sorting results
27
Who pays? Users could pay Websites could pay Government could pay
This model is used for subscriber databases like the journal indexes available at our library It was part of the model for providers like AOL and CompuServe Websites could pay There are ethical problems here Sponsored links is Google's solution Government could pay But usually doesn't pay directly Advertisers could pay Just like TV, ads pay for most of the Internet
28
Other issues There are things that Google doesn't index
For technical reasons: The spiders can't gather information from certain systems or file types It is possible to ask spiders not to index your page Google chooses not to index some pages There is this idea of the "deep web" that is not as easy to search Google is a big popularity contest Countries ban sites China censored the Chinese version of Google Google eventually stopped agreeing to be censored
29
Tracking searches Google tracks the most popular searches over time
Twitter does something similar with trending tweets In terms of privacy, it is possible to track your searches and make inferences about you Even without knowing what computer you're at, a lot of personal information can be recovered from searches
30
Are there downsides to freely searchable information?
You have grown up with the Internet Free speech and widely disseminated ideas are central both to democracy and scientific advancement But are there downsides to information being so easy to get? If you don't find it, you might believe it doesn't exist Lies and distortions spread as quickly as truth Whoever controls the searching controls information People may not try to think through the answer on their own before searching for it
31
Upcoming
32
Next time… Cloud computing Amazon, Google, and Microsoft platforms
33
Reminders Keep working on Project 4 Due Thursday
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.