Archiving Newspaper Websites: A Case Study of the Chicago Tribune Kalev Leetaru –

Slides:



Advertisements
Similar presentations
Welcome to informaworld TM. The following demo will show you just a few of the features on informaworld TM. Please select where you would like start. ePublication.
Advertisements

Real Time Information.
What is touchPRO EXPRESS? touchPRO EXPRESS is a way for select industries who meet certain criteria to be able to get a mobile app at a low cost and have.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
Developing Custom Solutions Victor J. Pudelski. What is a Custom Solution???
Traditional Marketing Methods are Dead or Dying ” Unlocking the Power of Social Media for Higher Search-Engine Rankings “
Indie Portal Overview of Web Portal for Network of Independent Charter Schools Fall 2011.
Best Practices for Promoting Your Digital Publication 1859 Bowles Avenue, Suite 100, St. Louis, MO  (888)  
Web 2.0 Boot Camp. A funny thing happened… In the late 20th century, traditional media was all powerful and saturating.
PEPE 23 January 2008 © Institute for research and Innovation in Social Services. This work is licensed under the Creative Commons Attribution-Non- Commercial.
Web 2.0 and Libraries in the 21 st Century Caroline Kerbyson Cambridge University Press.
Blogging in the Classroom Blogging Assignment and Expectations MSTI 131 Introduction to Educational Technology Fall 2010 Prof. Nichole Heinsler What is.
INTEGRATING TECHNOLOGY IN THE CLASSROOM: IT TAKES MORE THAN JUST HAVING COMPUTERS BY AMANDA HAMILTON.
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
RSS. W HAT IS IT AND WHY IS IT USED ? B Y WHOM ? RSS stands for: Rich Site Summary or Really Simple Syndication It’s a technology that allows users to.
LIFE VIRAL Target Market Sales Person for Company: Prominent companies with a target market similar to Toowinty’s readers. Companies with exquisite.
CRL Global Resources Network News Preservation & Access July 13, 2011 James Simon Director, Global Resources Network.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
Seth Meyerowitz Certified Google Business Trainer Welcome To The Google & Online Marketing Seminar.
BUS 111 Eric Cain. All major sports websites compiled into one. Headlines, articles, and videos. Quick and.
Planned Giving Design Center. What is the Planned Giving Design Center? National network of websites dedicated to advancing philanthropy.
SWIFT 2.0 – New Teacher Training Laurie Kirkland October 15, 2014.
Copyright Guidelines An In-service for Middle An In-service for Middle School Teachers School Teachers By Bethany Worrell By Bethany Worrell.
Arif Fazel School of Molecular and Cellular Biology Academic Advisor IlliAAC Conference 2012 December 14, 2012 Tweet Us, Like Us, Watch US! MCB Goes Viral.
Making Connections for Student Success: a Social Media Roadmap Institutional Advancement: Web Services & Digital Media:
SPoRT’s Web Presence Fifth Meeting of the Science Advisory Committee November, 2009 Bradley ZavodskyErik Reimers Andrew MolthanPaul Meyer Geoffrey.
BBC is a British Broadcasting Corporation. A public service broadcaster in the United Kingdom. The website main responsibility is to provide public.
Using Social Media To Promote News Stories Christine Di Salvo.
Introduction to MedlinePlus Jamie Dwyer University of Illinois at Chicago Visiting Assistant Information Services Librarian.
Drive brand awareness. YouTube Promoted Videos YouTube Promoted Videos. Leveraging Your Video Assets.
What companies could/should be doing with RSS feeds.
Do You Have a Web Site?. Everyone does, don’t they?
QR code (Quick Response Code) Generator. A QR Code is a matrix code (or two- dimensional bar code) created by Japanese corporation Denso-Wave in 1994.
 Attractive page layout  The contrast and blend of clolours is well balanced  Legible fonts  Headlines, brief news items, photos and videos provided.
Really Simple Syndication Can Really Save Your Sanity! Lauren McSwain-Starrett, ITC Customer Communications Office Technology Conference 2009.
Business Research Methods Using the Internet- to aid your studies.
Overview In this tutorial you will: learn what a blog is understand how blogs may be used in e-learning identify different types of blogs.
Searching the “New” Web: Blogs & RSS ORALL Annual Meeting October 13, 2005 Presented by Bonnie Shucha UW Law Library
Becoming a Section Editor: A training for educators.
Google Sitemaps Case Study Eric Papczun SES Chicago Bulk Submit 2.0 December 5 th, 2006.
“ ”ing Facts Pinterest is the 3rd most popular social networking site in the world Over 4 million unique visitors daily Since May 2011, the number of.
FITT Fostering Interregional Exchange in ICT Technology Transfer Communication & Collaboration Tools.
Mara Bordignon Rosalie Waller Information Services Librarians Library Staff Tech Day: April 27, 2009.
Facebook for Business Greg Clement and Rick Scheeser.
C21 COMMUNITIES - TALKS A Social Media Platform for Agents & Brokers.
World English Dictionary Web 2.0 —n the Internet viewed as a medium in which interactive experience, in the form of blogs, wikis, forums, etc, plays.
News & Current Awareness Nancy Allee and Helen Look Public Health Informatics Services & Access February 13, 2004.
Washingtonpost.com: Reader Engagement Initiative September 30, 2006.
Discovering Computers Fundamentals, Third Edition CGS 1000 Introduction to Computers and Technology Spring 2007.
Does the World Wide Web seem too overwhelming? Do I have enough time to go look for the latest news updates? Would it help to have the latest news and.
LIBRARIES MEET THE GRID: Librarians in Cyberspace Virginia Allen Beth Avery.
Independence Middle School MEDIA CENTER /7.
Using aggregated RSS feeds in The King’s Fund Information and Library Service Creating our pick of information technology and health news.
Program Assessment User Session Experts (PAUSE) Information Sessions: RSS & Subscription Services October , 2006.
Using Google Analytics CMS User Group Meeting October 21, 2015.
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
+ “Introduction to Blogging” Katelyn Jacobsen By WordPress.org.
MODERN DAY NEWS Staying On Top of the News While Not Getting Bogged Down In Garbage.
Make it easy for customers. Add Sitelinks AdWords Sitelinks. Increasing choice and relevancy in your Search ads.
Paul Bradshaw Media consultant and trainer Visiting Professor, City University, London MA Course Leader, Birmingham City University Publisher, Online Journalism.
Searching the Web for academic information Ruth Stubbings.
We Start Your Conversation Business Wire’s enhanced press release presentation provides your news with maximum visibility and shareability. Note: This.
SLA San Diego Fall Seminar Round Table Discussion
101.
Finding the site that’s just right!
RSS What can it do for you? Rachel Hyland Systems Librarian
Training Deck – Native Advertising
Podcasting “Podcast” is one of those words that we hear tossed around a lot these days – it sounds kind of intimidating -- but what exactly is a podcast?
Podcasts & RSS Feeds in the Classroom
Presentation transcript:

Archiving Newspaper Websites: A Case Study of the Chicago Tribune Kalev Leetaru –

Archiving Newspaper Websites  Little hard data on how much content goes up on a newspaper website daily and how hard it is to archive on a regular basis.  Case study of the Chicago Tribune to explore this in detail.

What’s New(s)?  Hardest part is getting an inventory of what gets posted to the site each day.  No easy master inventory list of the URLs of new articles each day.  Tribune itself uses multiple content management systems and doesn’t have a single point of overview to its site.

RSS / Site Maps  Some sites like CNN offer strong date-sorted RSS feeds. Already in machine-friendly format. Just download every 30 minutes and you have a complete list of all new content on the site.  Google News Sitemaps service allows news sites to provide a list of all new content to select users like Google News.

Gateway Pages  Tribune has neither (has RSS feeds, but they are poor).  Must manually identify the primary gateway pages for the site (main pages of each topic). There are 105 for the Tribune as of October  Most recent X articles are listed on the gateway page. Must download them on a quick interval: every 30 minutes or you miss articles for some sections.

Exploring the Tribune  Downloaded all 105 Tribune gateway pages every 30 minutes from 9/15/2010 to 10/19/2010: total of 136,605 snapshots.  83% of links are to the DoubleClick.net advertising network, with just 11% of links pointing to Tribune pages.

Tribune Findings  Articles stay up, but the links to those articles last from 18 hours to 7 days, with an average lifespan of 56 hours.  Roughly 39% of articles are linked for less than a day: if you miss downloading the gateway page in that timespan, you won’t ever know that article every existed.  Average of 735 new articles posted daily.  Thursdays have highest adds, Sundays have the fewest.

Tribune Findings Figure 3 - Number new Tribune links seen by day

Tribune Findings  Content sections have high stratification: some have just a few articles added a day, others have very high add rates, adding new content 24/7 at a very high velocity.

Tribune Findings SectionTotalTribune%TribuneDoubleClickLifespan /business/ /sports/ /entertainment/celebrity/ /features/horoscopes/ / /technology/deals/ /news/nationworld/ /news/local/chicago/ /sports/football/bears/ /sports/college/ /health/ /news/education /news/opinion/blogs/ /news/opinion/share/ /entertainment/ /news/politics/ /news/local/ /news/opinion/ /news/columnists/all/ /news/columnists/all/ /news/corrections/ /sports/baseball/whitesox/ /sports/highschool/ Table 1 - Gateway pages ordered by average link lifespan

Conclusions  News sites can offer high-quality RSS and Sitemaps: benefits libraries and benefits them from increased consumer awareness of content.  Otherwise really need newspapers to engage with libraries to provide them content lists, too hard to do externally via monitoring, but newspapers themselves don’t even know their content due to fragmented content management infrastructures.

Thank You  Kalev Leetaru –   I-CHASS (Institute for Computing in the Humanities, Arts, and Social Sciences)  NCSA (National Center for Supercomputing Applications)  University of Illinois