Download presentation
Presentation is loading. Please wait.
1
The Designing of Web Services to Deliver Web Documents Associated with Historical Links David Chao College of Business San Francisco State University
2
Historical Links The historical links of a web site include the URLs invalidated due to: – web site reorganization – document removal, renaming or relocation and links to document snapshots: –document’s contents as of a specific point in time.
3
Benefits of Maintaining Historical Links Support applications that require historical data. –Trend analysis Audit the web page content at specific point in time. Preserve website content. Retrieve web pages using old links: –Bookmarks, favorites, published URLs in books
4
Related Researches Archiving and preserving websites: –Internet Archive WayBack Machine –Date/time stamped website snapshots Unable to meet users’ need for different snapshot times. Versioning –Unable to track organizational changes to a page Renaming, relocation Search engines –Unable to find unpublished old pages and web page snapshots
5
Objectives of this research Tracking changes to a web page using a log: –Insertion, deletion, modifications –organizational changes Renaming, relocation Be able to retrieve web pages using historical links. Deliver the historical documents using web services
6
Effects of web site reorganization and web document changes Web site actionsEffects on current and historical links Adding a new documentAdding a new URL to current links Modifying a documentNo change to URL; the old document becomes a snapshot and archived Deleting a documentDeleting a current link; adding a historical link and document is archived Renaming a documentAdding a new URL to current links; the old URL becomes historical link Relocating a documentAdding a new URL to current links; the old URL becomes historical link ReorganizationAdding all affected documents’ URLs to current links; adding all affected documents’ old URLs become historical
7
The M:M Relationship Between A URL And A Web Documdent
9
Logging Scheme The log, named TemporalURLLog, is designed to keep the history of changes to web documents. It has four fields: –URL: document’s URL –PublishDate: document published time –DocExpireDate: document expired date –URLExpireDate: URL expired date –NewURL: document’s new URL if any Note: A document may continue to exist with a new URL while its old URL is expired.
10
TemporalURLLog Maintenance Algorithm New document: An entry is entered with its URL and PublishDate; DocExpireDate, URLExpireDate, and NewURL are null. Deleted document: DocExpireDate and URLExpireDate to the time the document is deleted. Modified document: The DocExpireDate is changed to the time the document is modified and leave the URLExpireDate be null. Then, it adds a new entry with the same URL and the PublishDate is set to the time the document is modified; the DocExpireDate, URLExpireDate and NewURL are set to null. Renamed document: The URLExpireDate is changed to the renaming time and its NewURL field to the document’s new URL. Then, it adds a new entry with the new URL and the PublishDate is set to the time the document is relocated or renamed.
12
Archiving Scheme Deleted document: The deleted document in the Archive with URL + PublishDate as file name. Modified document: The old version is saved in the Archive with URL + PublishDate as file name.
13
With the scheme we can determine:. A historical link P1 is now renamed to P8, and a current link P1 points to a new document.. A URL P2 valid between T0 and T1 is deleted, and the document it pointed to is in the Archive with the name P2T0. A URL P3 has been modified repeatedly and is eventually deleted. All documents associated with P3 can be found in the Archive.. An old URL P5 is now renamed to P7. It has been modified at T3, and a copy of its snapshot can be found in the Archive with the name P5T1.. A URL P12 has never existed in the web site.
14
Design of Web Services to Retrieve Web Documents Scenario 1: The user submits a URL and gets the current version of the document, but would like to view its previous versions. Scenario 2: The user submits a URL and gets a file-not- found error. –check to see if the document ever exists in this web site –retrieve the document associated with the invalid URL if it is having a new URL –retrieve deleted documents from the Archive. Scenario 3: The user submits a URL and gets an unrelated document. Under this scenario, the current URL is associated with a different document.
15
Web Service for Scenario 1 This web service offers two methods: –1. RetrieveSnapshotAsOf method: This method takes a valid URL and the snapshot valid time as inputs and returns the link to snapshot at the specified time. –RetrieveAllSnapshots method: This method retrieves all links to snapshots of a current document.
16
Web Service for Scenario 2 This web service contains four methods: –1. IsURLEverExist method –2. IsURLValidAsOf method –3. FindCurrentURL method: This method traces the URL’s changes to locate the current URL if the document is still published. –4. FindLinkToOldDocument: This method retrieves the old document associated with a historical link valid as of a user specified time.
17
Web Service for Scenario 3 This web service contains four methods: –1. IsURLValidBefore method –2. IsURLValidAsOf method –3. FindLinkToOldDocument: This method retrieves the old document associated with a historical link.
18
Summary This paper has two contributions: –1. It presents a logging and archiving scheme to track a document’s history of changes. –2. It designs web services for users to retrieve documents associated with historical links.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.