What is Webometrics? Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge Studio (VKS) Information Studies
1. Introduction □Webometrics is concerned with gathering data on and measuring aspects of the Web □web sites □web pages □hyperlinks □web search engine results □YouTube video commenter networks □MySpace Friend networks □…for very varied social science purposes
New problems: Web-based phenomena □Webometrics can be applied to understanding web-based phenomena □Why do web sites interlink? □Which web sites interlink? □What interlinking patterns exist? □What topics are frequently blogged about?
Old problems: Offline phenomena reflected online □Some offline phenomena have measurable online reflections □International communication □Inter-university collaboration □University-business collaboration □The impact or spread of ideas □Public opinion
2. Examples Blog searching - blogpulse.com
Example: Identifying and tracking public science concerns in blogs Over 100,000 Blogs and other sources tracked daily via RSS feeds Objective: to identify and track public concerns about science E.g., “Schiavo” identified and tracked as potential public science concern
Example: The online impact of research groups (NetReAct)
Normalised linking, smallest countries removed Geopolitical connected Sweden Finland Norway UK Germany Austria Switzerland Poland Italy Belgium Spain France NL Example: Links between EU universities
International biofuels research network
Example: MySpace age profiles
percentage of profiles containing swearing moderatestrongvery strongsample size US males %47%2%1,530 US females %38%2%1,287 UK males % 8%171 UK females %38%3%130 (typical sample size for non-web swearing research)
emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective (36% of swearing) □and we r guna go to town again n make a ryt fuckin nyt of it again lol □see look i'm fucking commenting u back □lol and stop fucking tickleing me!! □Thanks for the party last night it was fucking good and you are great hosts. □That 50's rock and roll weekender was fucking mint! □Fuckin my space, my arse □1/2 d ppl cudnt even speak fuckin english! □yeah so me and sarah broke up and everythings fucking shit
YouTube – Video poster ages
YouTube friend network
Online impact - Keywords in web pages mentioning IWRM
Data Gathering/Processing Tools □Blogpulse.com – blog network diagrams □LexiURL Searcher – links, web text, YouTube, Flickr, Technorati □Issue Crawler, Google TouchGraph - links
Discussion points for online data □ Validity – is the underlying meaning of the text/video/picture readily apparent to the researcher? □Possibly not to any great degree for teenagers’ MySpace comments or very personal YouTube videos □ Reliability –are search engines accurate/good at returning the correct results? □Google blog search shows unreliability – very variable over time □Researchers can triangulate different similar search engines or over time to test reliability
Discussion points for online data □ Coverage – to what extent is all the phenomena of interest covered by the source (e.g., search engine) used? □ Sample bias – are certain types of people over-represented? (e.g., the more literate, the more vocal, the more politically active, youth, educated, creative types…)
Summary □The web contains a wide variety of interesting web and “web 2.0” content posted by many different people in many different formats □Webometric methods can give insights into this data
Books □Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool. □Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press. □
Important considerations □Data accuracy □Data cleaning □Context to help interpret results □Report results carefully
Example: Analysis of the accuracy of search engine results Live Search results analysis