Refactoring HTML Elliotte Rusty Harold
Why Refactor
What to Refactor To XHTML CSS REST
Move Away From Tag soup Presentation based markup Stateful applications
XHTML
CSS
REST All resources are identified by URLs. Safe, side-effect free operations such as querying or browsing operate via GET. Non-safe operations operate via POST. Each request is independent of all others.
Tools
The Refactoring Process 1. Identify the problem. 2. Fix the problem. 3. Verify that the problem has been fixed 4. Check that no new problems have been introduced. 5. Deploy the solution.
Things Can Go Wrong Backups Staging Servers Source Code Control
Validators W3C Markup Validation Service LogValidator Xmllint Editors: DreamWeaver, BBEdit, etc.
Testing HTMLUnit JsUnit HTTPUnit jWebUnit Fitnesse Selenium
Regular Expressions Learn them! But be cautious Prefer parser-based solutions
Tidy C (and PHP) Custom API Can handle most bad markup Usually produces well-formed XHTML Often produces valid XHTML $ tidy -asxhtml -m index.html
TagSoup Java and SAX Can Handle Anything Always well-formed May not be valid $ java -jar tagsoup.jar -- encoding=ISO index.html
Well-formedness Defined Every element has one parent elemnet; no overlap Every start-tag has a case-sensitive matching end-tag Attribute values are quoted Entity references are defined +Namespaces
Well-formedness Refactorings Make name lower case Quote attribute value Replace empty tag with empty-element tag Add end-tag Eliminate overlap Convert text to UTF-8 Escape < and & Introduce an XHTML DOCTYPE Introduce the XHTML namespace
Validity Defined The document has a DOCTYPE <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "/dtds/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" " The document adheres to constraints expressed in the DTD Nothing that’s not in the DTD Not as important as well-formedness
Validity Defined The document has a DOCTYPE <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "/dtds/xhtml1-transitional.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" " Document adheres to constraints expressed in the DTD
Validity Refactorings Introduce Transitional DOCTYPE Introduce Strict DOCTYPE
Transitional Eliminate bogons Add alt attributes
Srict Replace center, b, i, font, etc. with CSS Nest inline elements in block elements
Layout Wrap related information in divs Add ID attributes Replace table layouts with CSS Replace frames with CSS positions Put the content first Markup lists as lists Replace blockquote/ul indentation with CSS Replace spacer GIFs
Accessibility Convert images to text Add labels to forms Standard names for input fields Add tab indexes to forms Add skip navigation Add internal headings Provide captions, summaries, and headers for tables Identify acronyms
Web Applications Replace GET with POST Replace POST with GET Replace Flash with HTML Make web apps cache savvy Provide Etags Add Web Forms 2.0 Types Block robots Avoid SQL injection
Content Check spelling Check links Restructure sites but keep the URLs Remove entry pages Hide addresses from spambots
Objections To Refactoring We don’t have the time to waste on cleaning up the code. We have to get this feature implemented now! Refactoring saves time in the long run. You have more time than you think you do.
Further Reading Refactoring HTML: Elliotte Rusty Harold Refactoring: Martin Fowler Designing with Web Standards:Jeffrey Zeldman The Zen of CSS Design: Dave Shea & Molly Holzchlag