CSC 102 Lecture 12 Nicholas R. Howe

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Search Engines & Search Engine Optimization (SEO) Presentation by Saeed El-Darahali 7 th World Congress on the Management of e-Business.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Information Retrieval
Search Engine Optimization (SEO)
Search Tools for the Internet Adapted from: Kathy Schrock M. Rosettis St. Augustine CHS.
Search Engine Optimization
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
ITIS 1210 Introduction to Web-Based Information Systems Internet Research Two How Search Engines Rank Pages & Constructing Complex Searches.
Search Engines & Search Engine Optimization (SEO).
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).
Marketing Mix - Promotion. MySpace Adds Different models of adds.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Lecture 4 Title: Search Engines By: Mr Hashem Alaidaros MKT 445.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
The Business Model of Google MBAA 609 R. Nakatsu.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
SEO for Google in Hello I'm Dave Taylor from Webmedia.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Week 1 Introduction to Search Engine Optimization.
WELCOME TO THE LIBRARY PLEASE FIND YOUR FOLDER & TAKE A SEAT LOG ONTO YOUR COMPUTER AND GO TO THE LIBRARY WEBSITE.
PPC Tutorial For Beginners. Content  PPC  Paid & Organic Advertisement  What is Search Engine?  How to set up account in Google Adwords? Target Your.
Searching Newztext Plus Using the example of searching for news articles on house prices in Auckland from the New Zealand Herald published during the period.
CONDUCTING RESEARCH How to find information on the Internet.
CHAPTER 16 SEARCH ENGINE OPTIMIZATION. LEARNING OBJECTIVES How to monitor your site’s traffic What are the pros and cons of keyword advertising within.
Presentation by Sunitha SEO Company in India- KG Tech
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
SEARCH ENGINE OPTIMIZATION.
Client-Side Internet and Web Programming
How to Find and Select Credible Sources of Information
Internet Econ: Google/Facebook POV
Search Engines and Search techniques
Web Searching Strategies
FREE TRAFFIC STRATEGIES
Digital Marketing Overview
What is Search Engine optimization
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
SEO - Drive Traffic and Grow Visibility
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Strategies for Researching Information Online
Ian Ramsey C of E School GCSE ICT On the move Final steps.
Alternative Internet Marketing Techniques
Spreadsheets, Websites
Information Retrieval
Maximizing Exposure for Your Non-Profit
Data Mining Chapter 6 Search Engines
New Mexico Broadband Program Introduction to the Internet
Creating Your Blog with Blogger
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Search Engines And how they work.
DIGITAL MARKETING SERVICES FOR YOUR BUSINESS ftlmedia.com.
Best Digital Marketing Tips For Quick Web Pages Indexing Presented By:- Abhinav Shashtri.
Helpful Things To Know For Successful Digital Marketing Strategy Presented By:- Abhinav Shashtri.
Best SEO Techniques To Increase Organic Traffic Presented By:- Abhinav Shashtri.
ADVANCED SEARCH ON WESTLAWNEXT
Lesson 2: Gathering and Organizing Information Using ICT KEY QUESTION: HOW DO YOU GATHER AND ORGANIZE INFORMATION USING THE COMPUTER AND INTERNET?
Presentation transcript:

CSC 102 Lecture 12 Nicholas R. Howe Web Search CSC 102 Lecture 12 Nicholas R. Howe

Data Collection Web crawler/bot/spider: traverse links & collect pages Most of web is single clump Some pages deliberately omitted (databases, etc.) How often to update? Google: is once an hour fast enough? Archived on huge server farms 2006: 850 TB on 25K servers Library of Congress = 20 TB

Search Model User provides keyword query Search provider retrieves & ranks relevant pages Critical factors: relevance of results, speed Ads also served based upon relevance Advertisers bid for keywords, pay for clicks Google chooses which ads to display based on expected revenue (expected clicks x price bid) Q. How to judge relevance automatically?

Ranking: Bag of Words Dominant method B.G. (Before Google) Concept: page ranking based on frequency of keyword appearances Context not considered Word order not considered Pages boost rank by including keyword lists All forms converted to word stem runs, runner, running  run-

Query Augmentation What about pages with words related to a query? “Sports” vs. “Athletics” “Roses” vs. “Flowers” Query augmentation: Initial retrieval on query (results not shown to user) Identify common words in top pages & add to query Display results from augmented query

Authority-Based Search Bag-of-Words bad at identifying useful pages Blather not useful; keyword lists not useful Need new way to identify good pages! Idea: Harness existing human knowledge Useful pages attract many links Authority: many pages point to it Hub: points to many authorities Rerank pages with authorities at top http://www.prchecker.info/check_page_rank.php

PageRank Algorithm All pages start with equal PageRank They keep a little (15%) and split the rest among their links After many rounds, well-linked pages have high rank Poorly linked pages have low rank A 0.15 B 0.15 C 0.15 D 0.58 E 1.85 F 0.58 G 0.70 H 1.30 I 1.00

Why Newsgroup Spam? Link from site with high PageRank can lift web site out of obscurity Businesses will pay for higher rankings Consultants raise rankings for pay Posts on newsgroup sites can include links Defense is the CAPTCHA “Completely Automated Public Turing test to tell Humans and Computers Apart”

The nofollow Attribute Google introduced a new attribute on links: <a href=“link.html” nofollow=“nofollow”>link</a> Indicates that link should not count for authority analysis Newsgroups & discussion boards can add this attribute to all embedded links Apparently many do not Also allows link shaping: intentionally emphasizing certain links/sites over others

Smart Search You’ve probably been doing web searches all your life What strategies do you use when feaced with a difficult search problem? [Discuss]

Search Strategies General Advice Specific techniques Consider type of query in light of goal Switch strategies when approach not working Assess credibility of all sources! Specific techniques Add keywords (Christ  Christ Smith) Use quotes for phrases (“Carol Christ”) Exclusion/inclusion (Christ –Jesus +Carol) Advanced Search offers many other options

Search Variants Local Search is restricted by geography Include zip code or address in search terms Returns hits ranked by relevance and location Image Search returns images related to query Based upon surrounding words, not on actual image appearance Query By Image Content is more difficult Data gathering: Google Image Labeler

Wolfram Alpha Combination of search engine and automatic almanac Pulls information off web & reformats Can compute some answers from others Examples: ASCII Blood donation http://www.wolframalpha.com/

Lab Try out the lab on search methods

A H A H