Visualization of the Webpage Popularity for Ping Wales Visualization of the Popularity of the Web Access for Ping Wales Xiaochuan Huang (George) Supervised by Dr Markus Roggenbach Department of Computer Science University of Wales Swansea Nov. Gregynog
Visualization of the Webpage Popularity for Ping Wales Overview 1.A Regular Website Report 2.Specification 3.Technology Involved 4.A First Approach
Visualization of the Webpage Popularity for Ping Wales 1. A Regular Website Report What the project is about Our customer, Ping Media Ltd; the website, Ping Wales; What they need; and the technical infrastructure
Visualization of the Webpage Popularity for Ping Wales 1. A Regular Website Report What the project is about Introducing similar tools Log file analyzers; The AWStats and Analogs 6.0; Graphic statistics generated by AWStats and Analog
Visualization of the Webpage Popularity for Ping Wales 1. A Regular Website Report
Visualization of the Webpage Popularity for Ping Wales 1. A Regular Website Report What the project is about Our customer, Ping Media Ltd; the website, Ping Wales; What they need; and the technical infrastructure Introducing similar tools Log file analyzers; The AWStats and Analogs 6.0; Graphic statistics generated by AWStats and Analog Why this application is necessary Customer’s needs; The shortage of existing applications; Extendable project
Visualization of the Webpage Popularity for Ping Wales 2. Specification Components The filter/parser; The analyzer; Two databases; Visualization Going through the processes Take daily log file -> parse with DB1 -> output filtered result -> write result into DB2 Given a specified duration -> access DB2 -> generate the records -> output an visualized report
Visualization of the Webpage Popularity for Ping Wales 3. Technologies Involved The Apache log files Introduction;
Visualization of the Webpage Popularity for Ping Wales 3.Technologies Involved The Apache log files Introduction; Format; "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined [12/Jan/2005:00:12: ] "GET /hardware/toshiba-small-80gb-hdd.html HTTP/1.0" " keynote.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/ Epiphany/1.4.4"
Visualization of the Webpage Popularity for Ping Wales The Apache log files Introduction; Format "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined [12/Jan/2005:00:12: ] "GET /hardware/toshiba-small- 80gb-hdd.html HTTP/1.0" " keynote.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/ Epiphany/1.4.4" Log string analysis: (%h) : the IP address of the client (%l)The RFC 1413, identity of the client (%u)The userid of the requesting person (%t)[12/Jan/2005:00:12: ]: the request time (\"%r\") "GET /hardware/toshiba-small-80gb-hdd.html HTTP/1.0" method, request page, client protocol (%>s) 200: the status code (%b)11020: the size of the object returned to the client (\"%{Referer}i\") the site that the client reports having been referred from. (\"%{User-agent}i\") identifying information of client browser
Visualization of the Webpage Popularity for Ping Wales 3. Technologies Involved The Apache log files Programming language – Ruby interpreted scripting language for quick and easy object-oriented programming % ruby puts "Hello, world! “ ^D Hello, world! % cd sample % ruby eval.rb ruby> a = "Hello, world!" "Hello, world! “ ruby> puts a Hello, world! Nil ruby> ^D %
Visualization of the Webpage Popularity for Ping Wales 3. Technologies Involved The Apache log files Programming language – Ruby Database access MySQL, The two databases Access DB with Ruby
Visualization of the Webpage Popularity for Ping Wales 4. A First Approach load the daily log file Parsing/Filtering while not end of file read hit, line by line for each hit, getIP(%h), getTime(%t), getReq(\"%r\"), getSt(%>s) Check if even(first( getSt() )), then go through the articles database looking for getIP() if there is, write such hit to database 2, read next go to next hit Analyzing Specify StartingTime, EndTime, build an array/stack: myArray Read through records from database 2, for those within the specified time for each hit, if getIP() is in myArray, then counter+=1 otherwise, write this hit to myArray, initial counter Sort myArray according to counter of each element Write out the result of top Ns to file, for visualizing
Water flow model Take daily log file -> parse with DB1 -> output filtered result -> write result into DB2 Given a specified duration -> access DB2 -> generate the records -> output an visualized report Daily Log File Filter Database 1 Database 2 Visualization Tool Graphic Report AnalyzerPeriod entryRecords
Visualization of the Webpage Popularity for Ping Wales Summary What I have done so far & What I am planning to do next
End… hey weak up, there he ends !! LOL George