What ’ s on Wikipedia, and What ’ s Not … ? Completeness of Information on the Online Collaborative Encyclopedia Cindy Royal, Ph.D. Assistant Professor.

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

Sixteen Questions About Software Reuse William B. Frakes and Christopher J. Fox Communications of the ACM.
Mark Troy – Data and Research Services –
Mine Action Information Center
EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.
May 4, 2006 Iowa Actuaries Club “ What Every Actuary Should Know About Investing” Scott Christensen, FSA, CFA, MAAA Principal Financial Group Investment.
® Introduction Low Back Pain and Physical Function Among Different Ethnicities Adelle A Safo, Sarah Holder DO, Sandra Burge PhD The University of Texas.
Developing an Eye for Resemblances: FRBR and Relevancy Ranking in WorldCat Local Greg Matthews & Jon Scott WorldCat Discovery Day 30 July 2010.
Assessing Law and Order The Lesson from the Global Competitiveness Index and the Growth Competitiveness Index  Irene Mia  Senior Economist  Global Competitiveness.
Spring INTRODUCTION There exists a lot of methods used for identifying high risk locations or sites that experience more crashes than one would.
Chapter 5 Time Series Analysis
Defense: Knowledge Sharing and Yahoo Answers: Everyone Knows Something L. A. Adamic, et al.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
USING STUDENT OUTCOMES WHEN INTEGRATING INFORMATION LITERACY SKILLS INTO COURSES Information Literacy Department Asa H. Gordon Library Savannah State University.
Chapter 4 Research UP B Class.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
RESEARCHING TIPS & STRATEGIES Summer 2008 Melanie Wilson Academic Success Center MSC 207.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Exploratory Data Analysis: Two Variables
Periodical Databases Full-text article – entire textual contents of article in online format Abstract – brief summary of article Citation – basic information.
Measures of Central Location (Averages) and Percentiles BUSA 2100, Section 3.1.
Bibliometrics toolkit: ISI products Website: Last edited: 11 Mar 2011 Thomson Reuters ISI product set is the market leader for.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Web 2.0 Tools Used in the Finance/Investment Management Industry.
The next step in performance monitoring – Stochastic monitoring (and reserving!) NZ Actuarial Conference November 2010.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
1 ACSI American Customer Satisfaction Index TM Citizen Satisfaction with the U.S. Federal Government: A Review of 2011 Results from ACSI Forrest V. Morgeson.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
RESEARCHING & EVALUATING Summer 2008 Melanie Wilson Academic Success Center MSC 207.
Internet Expertise for Researchers 101: Becoming a Critical Thinker Library Research Tool Kit Workshop May 6, 2013 Suzanne van den Hoogen, MLIS.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
International and Comparative Media Systems
How our approach to trendy technologies can help drive reputation & performance improvements for our company [Date] Template provided by Marc J. Schiller,
Finding Credible Sources
+. + Group Chat The news media are independent, socially responsible watchdogs that look out for the public interest. The media create and shape public.
Students’ and Faculty’s Perceptions of Assessment at Qassim College of Medicine Abdullah Alghasham - M. Nour-El-Din – Issam Barrimah Acknowledgment: This.
A Comparative Analysis of European Media Coverage of Children and the Internet Leslie Haddon Department of Media and Communication LSE
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Hidden Risks in Casualty (Re)insurance Casualty Actuaries in Reinsurance (CARe) 2007 David R. Clark, Vice President Munich Reinsurance America, Inc.
Are “Digital Natives” Dropping Print Newspapers? A National Survey of College Newspaper Advisers H. Iris Chyi, Ph.D. Assistant Professor School of Journalism.
Literacy in Information: Evaluating Internet Resources Jennifer Fendrick & Nicole Christensen In order to properly evaluate a website, the.
Selection Criteria Chapter 9. Selection Materials judged within framework of a given criteria Materials judged within framework of a given criteria Relate.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
ITGS Databases.
Tackling the Complexities of Source Evaluation: Active Learning Exercises That Foster Students’ Critical Thinking Juliet Rumble & Toni Carter Auburn University.
WEBSITE CRITIQUE On Scientific News Websites. IFLSCIENCE.com.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Online students’ perceived self-efficacy: Does it change? Presenter: Jenny Tseng Professor: Ming-Puu Chen Date: July 11, 2007 C. Y. Lee & E. L. Witta (2001).
Derivatives Usage in Risk Management by Non-Financial Firms: Evidence from Greece By Spyridon K. Kapitsinas PhD Center of Financial Studies, Department.
Sixteen Questions About Software Reuse William B. Frakes and Christopher J. Fox Communications of the ACM.
What Affects Students’ Performance in School? A report by: Justin Caldwell.
+ Wikifying the CBC Alfred Hermida, Assistant professor Amanda Ash, Graduate student UBC Graduate School of Journalism.
Chapter 7 Researching Your Speech. Researching your speech: Introduction Researching your topic and providing strong evidence for your claims can make.
Continuing Education Provincial Survey Winter 2012 Connie Phelps Manager, Institutional Research & Planning.
Integrating Social Media into IS/IT Courses: A Success Story to Report.
Zale Library at Paul Quinn College Information Literacy Module 1: Selecting Good Information Dr. David Hamrick Reference/Cataloging Librarian.
Openness, IP and Innovation Workshop on Contemporary Research Challenges University of Glasgow 16 th March 2016.
Research Design. How do we know what we know? The way we make reasoning Deductive logic Begins with one or more premises, reasoning then proceeds logically.
Describing Scatterplots
Evaluating of Information
Statistical Process Control
Biases in Experimental Design: Validity, Reliability, and Other Issues
Measuring Exposure To Exchange Rate Fluctuations
Do Malaysians Trust Public Institutions?
Process Capability.
The greatest blessing in life is
Project Management Process Groups
Questioning and evaluating information
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

What ’ s on Wikipedia, and What ’ s Not … ? Completeness of Information on the Online Collaborative Encyclopedia Cindy Royal, Ph.D. Assistant Professor Texas State University School of Journalism and Mass Communication Deepina Kapila Graduate Student Texas State University School of Journalism and Mass Communication

Introduction - Wikipedia Wikipedia ( deemed “the free encyclopedia,” was launched on the web in Since then, it has become the Web’s 3rd most popular news and information source It uses the Wiki software format, which allows a community of users to develop and monitor content Wikipedia operates under the assumption that the public will act as a policing force, keeping content reliable and up to date.

Introduction - Research Denning et al. (2005) listed the risks inherent in Wikipedia’s model: accuracy, motives, uncertain expertise, volatility, coverage, sources. Bopp and Smith (2001) state that coverage in an encyclopedia should be “Even across all subjects” Shoemaker and Reese (1995) identified the individual as a news influencer. Web users and content creators tend to be young. Tankard/Royal (2005) – inherent biases in Web content, based on systematic searches.

Research Questions This project measures the content of Wikipedia against various indexes or standards of completeness to identify and uncover potential inherent biases. We are asking: 1. Are there some systematic gaps or biases in the overall presentation of information made available on Wikipedia? 2. Is recency (or currency) a predictor of amount of information on Wikipedia? 3. Is importance of information a predictor of amount of information on Wikipedia? 4. Is population a predictor of amount of information about particular countries on Wikipedia? 5. Is economic power a predictor of amount of information about individual corporations on Wikipedia?

Method Using predictors of recency, importance, country population, and economic power, several systematic searches on Wikipedia were conducted Each article for each topic was visited, the relevant content highlighted, and the selection ’ s words were counted Word counts were captured in a spreadsheet, and items were plotted on charts Ascending order Predictor variable

Topics Covered Years ( ) Academy Award Winning Films Time Magazine ’ s Person of the Year #1 Song on Billboard Top 100 ( ) Encyclopedia Terms Countries in the United Nations Fortune 1000 companies

Results - Years Ascending OrderChronological Order -Backward L-shaped curve -Clear progression of length of article with year; dramatic increase in years after Years in the future displayed understandably shorter word counts -Spearman Correlation between variables:.79

Results - Films Ascending OrderChronological Order -Backward L-shaped curve is apparent. -With few exceptions (ie. Gone with the Wind, 1939 and Casablanca, 1943) the results show progression favoring more current films. Recency is important, but certain films transcend time and are deemed important for other reasons. -Average word count for films since 2001 was 80% higher than word count before Spearman correlation between variables:.49; increased to.62 simply by removing 2 outliers

Results - Person of the Year Ascending OrderChronological Order -Softer backward-shaped L curve -Even distribution shows bias is unrelated to recency, measured by another variable of importance -Spearman Correlation between variables: O-there was no relationship with time.

Results - Billboard Top 100 Ascending Order Chronological Order -Backward L-shaped curve -Although Average word count was 32% higher for artists since 1990, distribution shows trend similar to movies in that some artists transcend time. -Spearman correlation between variables:.40 (by eliminating 2 outliers)

Encyclopedia Terms Ascending Order -Comparison between Encyclopedia Britannica and Wikipedia articles -Backward L-shaped distribution apparent -Spearman correlation used to compare inches of content in Encyclopedia Britannica with word count in Wikipedia:.26 -Of 100 terms, 14 were not represented in Wikipedia

Results - UN Countries Ordered by populationAscending Order -Backward L-shaped curve - although fairly evenly distributed, a SHARP increase appears for the top 22 countries. -Gradual upward curve in 2 nd chart shows that as population increases, so does word count -Average word count for top 10% of countries was 63% higher than the rest on the list -Spearman correlation between variables:.55

Results - Fortune 1000 Ascending OrderOrdered by Revenue -Backward L-shaped curve -SHARP increase for top 10% of companies by revenue -Top 10% of companies by revenue counted for 30% of total word count on companies -Spearman correlation between variables:.49

Conclusion -Information on Wikipedia is volatile, dynamic and constantly changing over time -Wikipedia’s purpose is to serve as a general reference source, but the content is weighted due to its contributors’ demographics -In each search performed for the dimensions, strong biases were evident and strong correlations experienced: -Currency/Recency: the more current topics were covered the most -Random Selection: Encyclopedia terms showed clear bias towards more common or popular terms -Relevancy: Wikipedia’s word count correlates to inches in a traditional encyclopedia, showing a strong agenda by each publication -Population: the larger the country and the larger its population, the higher the word count -Revenue: The larger the revenue, the higher the word count