Quality Data: Fresno State's Analytics Strategy Rob Robinson Web Developer for Fresno
Basic Organization Our Web Communications team is responsible for the entire campus web presence. –Except specific applications such as PeopleSoft Portal, , and Blackboard. Maintaining a large set of pages gives us a much bigger picture of trends in usage We can see campus-wide trends over time, and real-time current usage
Basic Infrastructure Single physical Dell machine hosted with Rackspace –Our centralized web team is responsible for the server Centralized Google Analytics –Our centralized web team is responsible for all Google Analytics accounts
Some Stats Total http requests per day ( avg ) –.html ( 620,000 ) –All files : ( 2,400,000 ) Total pages on server –.html ( 70,002 ) Total pages in CMS : ( 19,781 ) We will be moving to a fully responsive template this summer
Not Just Web Analytics Web Analytics –Who is viewing / How are they viewing ? Server Analytics User / Staff Analytics –From OU Campus Users “Custom Report” Page Freshness –From OU Campus Pages “Custom Report” –Page age vs page views ?
Problems to be Solved Where are our major entry points ? –( page views / entry pages ) What are people doing on our pages ? –( searches / events ) Given that information, can we optimize our entry points for proper navigation ? What types of devices are being used ?
Problems to be Solved Volume of requests over time Previous year or term usage ( especially 1 st week of classes ) –Preferably Predictive Indicators
Entry Points and Page Views Data Sources: –Apache Access Log Data –Google Analytics
Searches Apache Access logs –We can see searches if referrer was our Google Search Appliance, and which page the user landed on –Regular search terms from Google are now hidden. Google Analytics ( sometimes ) –GA does provide some searches
Searches Which page did the user land on? What is the user searching for ? Did the user click on what we wanted them to click on ? Search vs. Navigation ? –Nielson Norman Group Says year olds display search dominate behavior.Nielson Norman Group Says year olds display search dominate behavior. –Converting Search Into NavigationConverting Search Into Navigation
Event Tracking Javascript / DOM events captured by GA –Catalog Tabs $(document).ready(function(){ window.setTimeout(function() { var maxLen = $('#tabsaccordion-0-tab-0').parent().children().length; for ( var i = 0; i < maxLen; i += 1 ) { $('#tabsaccordion-0-tab-0').parent().children().eq(i).on('click', function(){ ga('send','event','Tab Click',location.href.replace(" }); } },800); }); ms.html#courses
Event Tracking Javascript / DOM events captured by GA –Map Checkboxes $("[id$=-cb]").each( function(){ $(this).change( function(){ var sFormattedMessage = $(this).attr('id') + " " + $(this).is(':checked'); _gaq.push(['_trackEvent', 'CheckBox', 'Use', sFormattedMessage, null, true]); });
Event Tracking - Errors Dom Errors with Google Analytics Classic: window.onerror = function(message, file, line) { var formattedMessage = '[' + file + ' (' + line + ')] ' + message; _gaq.push(['_trackEvent', 'Exceptions', 'Application', formattedMessage, null, true]); } Universal: window.onerror = function(message, file, line) { var formattedMessage = '[' + file + ' (' + line + ')] ' + message; ga('send','event','Exceptions','Application',formattedMessage); }
Errors and Such Top error pages / documents –From Apache error log Large Images embedded in pages –Python and Bash Large Images stored on server –f ind /var/www/htdocs/ -size +10M -exec ls -lah {} \;
Mobile Users Our IT Strategic Plan states that we should ensure our infrastructure (hardware, software, network and support services) is adequate to sustain the widening use of smartphones, tablets, and laptops How much of our traffic is actually coming from mobile devices ?
Mobile Users -- Homepage
Mobile Users – Help Center
Mobile Users – Student Affairs
Mobile Users -- Catalog
Mobile Users – Map
Server Analytics CPU load Incoming and Outgoing Bandwidth Outgoing Mail Breech attempts Concurrent connections sampling
Established Connections Get a count of all established connections to your apache web server: –netstat -pant | grep httpd | grep -c ESTAB Get a count of all connections that are in a waiting state: –netstat -pant | grep httpd | grep -c WAIT Every 5 minutes, each of the previous entries are placed into a JSON file named as today’s date
Google Analytics Popular Pages / Entry Points Unique Page Views Device Usage Bounce Rates Click / Event Tracking Window.error event tracking
Real-Time Server Analytics # during the time period of 10:01 central time, what where the top 10 referrers –grep '2014:10:01' /logs/web/apache/ | awk -F\" '{print $4}' | sort | uniq -c | sort -nr | head -10 # top 10 referrers from the last 1000 requests –tail /logs/web/apache/ | awk -F\" '{print $4}' | sort | uniq -c | sort -nr | head -10 # top 10 visited pages of the last 1000 requests –tail /logs/web/apache/ | awk -F\" '{print $2}' | sort | uniq -c | sort -nr | head -10 # top 10 most requested jpg files of the last 1000 requests –tail /logs/web/apache/ | grep 'jpg' | awk -F\" '{print $2}' | sort | uniq -c | sort -nr | head -10
Our Home Grown Dashboard Uses: JSON – aggregated and collected daily jQuery Async Calls to RESTful “Web Services” Highcharts – for graphing Uses GAPI for accessing Google Analytics Data via web service ( nothing active now… ) Still an active prototype
Our GA Dashboard
Our GA Dashboard
Where do we go from here ? Combining data from OU Campus “Custom Reporting”, and our server analytics… What pages are not being used ? Pattern detection … –“big data” ?
Questions ???
References 26.pdfhttp://net.educause.edu/ir/library/pdf/ELI30 26.pdf plan/documents/IT%20Strategic%20Plan %20-%20Final.pdfhttp:// plan/documents/IT%20Strategic%20Plan %20-%20Final.pdf 12/05/16/stop-redesigning-start-tuning- your-site/ 12/05/16/stop-redesigning-start-tuning- your-site/