Unicode Normalize Engine Submitted by: Jose Yallouz Shlomi Ben-Shabat Supervisor: Maxim Gurevich
Project Goals Recognition of web pages’ encoding. Translation of web page to Utf-8. Normalize the web into a single encoding standard- Utf-8.
Translation Decision HTML HTTP Header URL Bom tag Auto Detection METAHTTP Unicode Output
Class Diagram
Heuristic For Encoding Detection
ODP analysis Average detection of percent.
Application Usage Client usage – client browser can use this system to show the different web page in one encoding format – utf8. Server usage – web server can use this system to translate the different storage pages into utf8. Processing usage – different web page processing systems, like search engines, can use our system to convert different pages into the standard Unicode encoding.