Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old.

Slides:



Advertisements
Similar presentations
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Advertisements

Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Internet Research Search Engines & Subject Directories.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Meta Tags What are Meta Tags And How Are They Best Used?
Lazy Preservation: Reconstructing Websites from the Web Infrastructure Frank McCown Advisor: Michael L. Nelson Old Dominion University Computer Science.
Website Reconstruction using the Web Infrastructure Frank McCown Doctoral Consortium June.
HT'061 Evaluation of Crawling Policies for a Web-Repository Crawler Frank McCown & Michael L. Nelson Old Dominion University Norfolk, Virginia, USA Odense,
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
WHAT IS A SEARCH ENGINE. Widescreen Presentation Proteus, Keeper of Knowledge. Proteus is synonymous with change and success.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Searching Information. General Steps Identifying Key Words, Synonyms, and Key Phrases Constructing an effective search statement Advance search/boolean.
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University, USA {sainswor, aalsum, hany, mweigle,
Dynamic Web File Format Transformations with Grace Daniel S. Swaney, Frank McCown, and Michael L. Nelson Old Dominion University Computer Science Department.
Crawling and Aligning Scholarly Presentations and Documents from the Web By SARAVANAN.S 09/09/2011 Under the guidance of A/P Min-Yen Kan 10/23/
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Can scientific collaboration and excellence be measured by Web presence and Web links? Judit Bar-Ilan Bar-Ilan University and The Hebrew University of.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Evaluation of the NSDL and Google for Obtaining Pedagogical Resources Frank McCown, Johan Bollen, and Michael L. Nelson Old Dominion University Computer.
Ph.D. Progress Report Frank McCown 4/14/05. Timeline Year 1 : Course work and Diagnostic Exam Year 2 : Course work and Candidacy Exam Year 3 : Write and.
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
Lazy Preservation, Warrick, and the Web Infrastructure Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA JCDL 2007.
Factors Affecting Website Reconstruction from the Web Infrastructure Frank McCown, Norou Diawara, and Michael L. Nelson Old Dominion University Computer.
What is Web Information retrieval from web Search Engine Web Crawler Web crawler policies Conclusion How does a web crawler work Synchronization Algorithms.
Lazy Preservation: Reconstructing Websites by Crawling the Crawlers Frank McCown, Joan A. Smith, Michael L. Nelson, & Johan Bollen Old Dominion University.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Brass: A Queueing Manager for Warrick Frank McCown, Amine Benjelloun, and Michael L. Nelson Old Dominion University Computer Science Department Norfolk,
Use Google Scholar! What the experts say: Use Google Scholar Use simple search for articles on library homepage Better: in the digital library main screen.
SEARCH ENGINES The World Wide Web contains a wealth of information, so much so that without search facilities it could be impossible to find what you were.
Introduction to Digital Libraries Week 15: Lazy Preservation Old Dominion University Department of Computer Science CS 751/851 Spring 2010 Michael L. Nelson.
Web-Based Information Retrieval Week 1: Administrivia Old Dominion University Department of Computer Science CS 895 Spring 2013 Michael L. Nelson 01/15/13.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engines and Search techniques
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Lazy Preservation, Warrick, and the Web Infrastructure
Search Engines & Subject Directories
Agreeing to Disagree: Search Engines and Their Public Interfaces
Just-In-Time Recovery of Missing Web Pages
How does Google search for everything? Computer Science at Work
Correlation of Term Count and Document Frequency for Google N-Grams
أدوات البحث عبر الانترنت
Introduction to Digital Libraries Assignment #3
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Characterization of Search Engine Caches
Anatomy of a Search Search The Index:
Web-Based Information Retrieval Week 1: Administrivia
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Introduction to Digital Libraries Assignment #3
Search Engines & Subject Directories
Search Engines & Subject Directories
Introduction to Digital Libraries Assignment #3
Correlation of Term Count and Document Frequency for Google N-Grams
Introduction to Digital Libraries Assignment #4
Searching the Internet
Introduction to Digital Libraries Assignment #2
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #3
Introduction to Digital Libraries Assignment #1
Introduction to Digital Libraries Assignment #4
Introduction to Digital Libraries Assignment #2
Presentation transcript:

Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old Dominion University, Norfolk, Virginia, United States Researchers: Screen scrape or use the APIs? Web User Interface (WUI) Application programming Int. (API) 5 Month Experiment Late May to Oct 2006: 1. General search terms. Queried for the top 100 results and total results using 50 popular search terms and 50 computer science (CS) terms. 2. URL backlinks. Queried for the number of backlinks (inlinks) to 100 randomly selected URLs. 3. Pages indexed for a website. Asked how many pages were indexed for 100 randomly selected websites. 4. URL indexing and caching. Queried to see if 100 randomly selected URLs were indexed and/or cached. Comparing Search Results 1. Overlap (P) 2. Kendall tau for top k results (K) 1 3. Penalize changes at the top (M) 2 1 R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. SIAM Journal on Discrete Mathematics, 17(1):134–160, J. Bar-Ilan, M. Mat-Hassan, and M. Levene. Methods for comparing rankings of search engine results. Computer Networks, 50(10):1448–1463, July More similar 1. ABCD 2. EDAF 1. ABCD 2. AEDF P = 0.50 K = 0.44 M = 0.14 P = 0.50 K = 0.56 M = 0.66 Examples Comparing WUI to WUI & API to API on Successive days BUT Are the APIs pulling from older and smaller indexes? Terms of Service: “You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers…)” Terms of Service: “You may not… use any automated process or service to access and/or use the service (such as a BOT, a spider, …)” No, it’s the 10 th result out of about 16,300,000! This is the 3 rd result out of about 24,100,000. There are 2,911 pages indexed. I see only 1,740. The URL is indexed and cached. It’s missing entirely from my index.

Comparing WUI to API For all 3 search engines, the WUI & API are most synchronized on the same day. Yahoo is less synchronized for cs terms. Google is less synchronized for popular terms. MSN is mostly synchronized For “algorithm”. Examples Other research projects at Old Dominion University: Lazy Preservation: Reconstructing Websites from the Web Infrastructure mod_oai: An Apache Module for Efficient, Automatic Web Harvesting Loose Disagreements Total search results Total BackLinks Google’s API shows fewer pages indexed. How many total results ARE there? Whose backlink counts are correct? Total Indexed Pages per Website Indexed / Cached Disagreements Yahoo seems to be confused. Google & Yahoo might be pulling from smaller indexes. KaA Pow!!! See Also Frank McCown and Michael L. Nelson. Agreeing to Disagree: Search Engines and their Public Interfaces. ACM IEEE Joint Conference on Digital Libraries (JCDL 2007). To appear. All graphs: Complete data set available upon request.