Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide.

Similar presentations


Presentation on theme: "The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide."— Presentation transcript:

1 The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide the controlled access to the worlds data. The web used to be a library – not any more, not organized in an open policy. Web 2.0 – dynamic both on the display side and the content generation side. Data generation is too fast for categorization today. Site access is no longer through the site front door. Search is the new form of control over information. Whoever controls the search engine has the power to shape what you see. Gopher, AOL, Yahoo – organized info >60% of sites reached through search ITEC 102

2 The goals of a search engine
Search engines are brokers – job is to supply data producers with data consumers. It may not be what is truthful or correct, but what consumers desire. Search results include organic and sponsored results. Search engines differ in results by as much as > 80%! The internet content is not regulated as TV/print media – implications? The fall of hierarchy – URL’s used to have meaning, they do not necessarily have meaning now. The modern “memex” – augmented human memory Yahoo – 1994, memex during world war II – president Rosevelt asked MIT prof. Bush to oversee science projects – he recognized the problem as too vast and getting worse. Too much new info to catalog – needed a memex. ITEC 102

3 How a search engine works
Engine does NOT search the web in response to a query. Gather information – crawl the web with spiders (bots). > 90% not reached. Keep copies – pages are cached. Removing pages does not remove cached copies. Copyright issues? Build an index – URL’s of visited pages stored – need to search – FAST! Understand the query – Natural language processing, advanced queries (boolena logic, ~ (synonyms), - (exclude), allinurl, inanchor, site:url, etc.) Determine relevance – subjective process. Degree of relevance trumps level of recall. Search algorithm assigns a number to result indicating relevance. Blue – background processes. Crawling may revisit pages at different intervals depending on engine design. Needs to avoid circles. Not like a library – removing copies. Binary search vs. linear search. Multi-core ITEC 102

4 How a search engine works – cont.
6. Determine ranking – critical to making results useful. More than just relevance – past search queries can impact ranking. Importance and reputation of a site may play a role. “Freshness”, popularity, links in, links out, keyword a major heading or secondary, quality, original content, etc. – all possible factors in rankings. All a matter of opinion. 7. Presenting results – List of results, may include images. Google’s PageRank (after Larry Page) – innovation Death penalty, ranking algorithm changes - affect ITEC 102

5 Who pays and for what? Sorting used to be the number 1 function computers were used for. Now it is searching. > 90% of online adults use search engines. Who pays for this “utility”? Users Web sites Governments Advertisers Governments and universities funded most early search innovations. Advertising – pay per click, auctioning keywords, Screen real estate limited FTC requested search engines to distinguish between organic and sponsored links AOL, Compuserve, NY times, Overture – 1998 – crossed presumed ethical line – selling rank for money & PPC ITEC 102

6 AdWords – a break through
Google created a viable business model by auctioning keywords pay per impression vs. per click Greatly affected traditional advertising Algorithmic searching – biased and can be changed at any time Most search engines use biased algorithms – judgment calls on your behalf ~12% overlap between search engine results Search engine optimization (SEO) industry heavily used The “deep web” is generally not viable via search results Google in China – illegal and harmful messages censored – different than the U.S.? ITEC 102

7 The Search Business Everything you search for can be stored.
The best search engine would understand what you want and provide exactly what you want. Tracked data influences future search results. Search engines control our information access. Information access is more valuable in market value than information creation. Any one can generate information on the web: We must consider “wisdom of the crowd” vs. experts ITEC 102

8 Evolution of Internet Use
Sources: Cisco estimates based on CAIDA publications, Andrew Odlyzko Is the web fading away? ITEC 102


Download ppt "The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide."

Similar presentations


Ads by Google