Sitemaps IT4102 Presentation Pádraig James O Donovan
What is a sitemap? XML file which lists all URLs on web page with associated metadata Grants better efficiency to web crawlers Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft
Sitemap format and protocol Must be encoded in UTF-8 Tag requisites Begin with an opening <urlset> tag and end with a closing </urlset> tag Specify the namespace (protocol standard) within the <urlset> tag Include a <url> entry for each URL, as a parent XML tag Include a <loc> child entry for each <url> parent tag All other tags are optional <lastmod> <changefreq> <priority>
Entity escaping Ampersand & & Greater Than > > Less Than < < Quotation ” " Apostrophe ‘ ' http://www.escapecodes.info/
File compression Sitemap files have limitations 50,000 URLs 50MB max If desired, you may compress sitemap files using gzip to reduce the bandwidth requirements Far reaching advantages https://www.gzip.org
Disadvantages XML syntax is verbose, especially for human readers, relative to other alternative ‘text-based’ data transmission formats. XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser. The distinction between content and attributes in XML seems unnatural to some and makes designing XML data structures harder.
Bibliography Sitemaps.org Escapecodes.org Techopedia.com
Thank you for listening!