Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Our Social Media Why and how to compose a social media release.
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advanced Piloting Cruise Plot.
Chapter 1 The Study of Body Function Image PowerPoint
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.
Website Design What is Involved?. Web Design ConsiderationsSlide 2Bsc Web Design Stage 1 Website Design Involves Interface Design Site Design –Organising.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Computer Literacy BASICS
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
ABC Technology Project
Online Algorithm Huaping Wang Apr.21
AMES-Cloud: A Framework of Adaptive Mobile Video Streaming and Efficient Social Video Sharing in the Clouds 作者:Xiaofei Wang, MinChen, Ted Taekyoung Kwon,
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Do you have the Maths Factor?. Maths Can you beat this term’s Maths Challenge?
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.
A SMALL TRUTH TO MAKE LIFE 100%
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
How Cells Obtain Energy from Food
CpSc 3220 Designing a Database
Traktor- og motorlære Kapitel 1 1 Kopiering forbudt.
Presentation transcript:

Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro

Mining di Dati WebAlessandro Barilari 2 Introduction Search engines are an important source of information and keeping them up-to-date will result in more accurate answers to search queries. Search engines create their databases by probing web servers on a per-URL basis with a little help from the web servers.

Mining di Dati WebAlessandro Barilari 3 Main Problem There are no standard for facilitating the push of updates from servers to search engines: – It takes up to six months for a few page to be indexed by popular web search engines; – The data which is indexed by the search engines is often stale.

Mining di Dati WebAlessandro Barilari 4 Solution… Web server help to facilitate search engine freshness results in a favorable situation for web sites, search engines and users.

Mining di Dati WebAlessandro Barilari 5 …and its problems The number of updates per second is very large. Must balance between: – The number of interactions between web sites and search engines, and – The freshness of the search engines.

Mining di Dati WebAlessandro Barilari 6 Page rank impact Pages which are popular will have higher page ranks: – Use popularity in addition to age and freshness to compute the mismatch between a web site and a search engine

Mining di Dati WebAlessandro Barilari 7 Summary Definitions and Cost Model Algorithm Analysis Pratical issues

Mining di Dati WebAlessandro Barilari 8 Some definitions Update: an update u to a file f is a modification to f that has been flushed to the disk; Propagation of an update: an update is said to be propagate when the web site has informed the search engine about the update. A SE may or may not retrieve that update; Meta-update propagation: At any time t, let U(t) be the set of unpropagated updates. The web site informs the search engine about all the updates U(t);

Mining di Dati WebAlessandro Barilari 9 Some definitions (2) Weight of a file: given a content file, its weight f (non- negative) denotes the importance of the file; the weights are chosen such that: Last_modification_time(u,t): the last time before t when the file f(u) was updated.

Mining di Dati WebAlessandro Barilari 10 The Cost Model Components: – Communication cost; – Opportunity cost: represents the stalenes of the search engine data as compared to the data on the web server. CPU cost is ignored

Mining di Dati WebAlessandro Barilari 11 Opportunity cost (OC) Given an unpropagated update u to a content file f; the opportunity cost for update u at time t is: OC(u,t)= f(u) x(t - last_modification_time(u,t)) Definition for meta-update propagation:

Mining di Dati WebAlessandro Barilari 12 Communication cost (CC) size f(u) (t): the size of file f(u) at time t;

Mining di Dati WebAlessandro Barilari 13 Potential Communication cost (PCC) Represents the communication cost which would need to be incurred in case update u were to be propagated after time t:

Mining di Dati WebAlessandro Barilari 14 The Cost Function Given that an update u is unpropagated at time t, the cost function for that update at time t is given by:

Mining di Dati WebAlessandro Barilari 15 Summary Definition and Cost Model Algorithm Analysis Pratical issues

Mining di Dati WebAlessandro Barilari 16 FreshFlow Algorithm When OC_tot equals PCC_tot at any time t, the web server can inform the search engine about all the unpropagated updates.

Mining di Dati WebAlessandro Barilari 17 Summary Definition and Cost Model Algorithm Analysis Pratical issues

Mining di Dati WebAlessandro Barilari 18 Analysis The cost of the FreshFlow algorithm (called FF) is compared with the cost of an optimal off-line algorithm (called ADV)

Mining di Dati WebAlessandro Barilari 19 Analysis (2) Lemma (1): OC(u,t) is monotonically non- decreasing; Lemma (2): suppose an update u to a file f, and suppose FF transmits but ADV does not. Then OC ADV (u,t)OC FF (u,t). Lemma (3): if the update is transmitted by the adversary (ADV), then CC ADV (u,t) CC FF (u,t).

Mining di Dati WebAlessandro Barilari 20 Theorem FF is 2-competitive: Cost FF (u,t) 2 x Cost ADV (u,t)

Mining di Dati WebAlessandro Barilari 21 Summary Definition and Cost Model Algorithm Analysis Pratical issues

Mining di Dati WebAlessandro Barilari 22 Pratical issues There are multiple search engines: – Synchronization effect: pushing the updates would put pressure on the last-hop link to the web server; – Search engine load: some search engines might deny the receipt of updates.

Mining di Dati WebAlessandro Barilari 23 The middleman approach Each web server contacts only one middleman for sending its updates; Could be a group of middlemen.

Mining di Dati WebAlessandro Barilari 24 Benefits The middleman can solve some additional issues: – Verifying trustworthiness of web servers; – Restricting the rate at which updates get transmitted to search engines;

Mining di Dati WebAlessandro Barilari 25 Limitations The algorithm has not been used in practice; The search engines need the cooperation of the web servers to keep track of updates to their URLs. Whether web servers will incorporate such a service remains to be seen.

Mining di Dati WebAlessandro Barilari 26 Conclusions The FreshFlow algorithm is a solution that improve the data updates of the search engines, mantaining high level efficiency and performance; The authors are planning to implement the algorithm in a real system (and have a future pubblication!)