1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.

Slides:



Advertisements
Similar presentations
1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013.
Advertisements

1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
PEER SUPPORT GUIDANCE COMMUNICATION RECRUITMENT STUDENT PROFESSIONAL DEVELOPMENT RESEARCH.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
Tanya Headley, MS Health Communication Specialist, NIOSH Mansi Das, MPH, MBA Health Communication Specialist, NIOSH NIOSH Social Media in Action.
Looking Ahead Archive-It Partner Meeting November 18, 2014.
Streamlined Scoping at North Carolina Kathleen Kenney.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
Carla Pendergraft, CMP Waco Convention & Visitors Bureau Social Media for Meeting Planners.
Using LinkedIn To Enhance Employability Prospects For Computing Students Thomas Lancaster Birmingham City University.
Web 2.0: Concepts and Applications 5 Connecting People.
Web 2.0: Concepts and Applications 5 Connecting People.
SOCIAL MEDIA. TODAY Business Today Social Media Importance What is Social Media Social Media Platforms Facebook & Twitter Accounts.
This webinar is brought to you by CLEONet CLEONet is a web site of legal information for community workers and advocates who work with low-income.
The adventures of LASSIE: libraries, social software and distance learners Dr Jane Secker London School of Economics and Political Science
Putting Social Media to Work for You By Jay Jenkins With thanks to Connie Hancock and Jenny Nixon UNL Extension Educators.
The Marketing Landscape. Partnering & Packaging Creates authentic experiences that provide a unique sense of place Keeps visitors in town longer Stretches.
Why? The kids use it to – Watch fun videos – Watch sports videos – Watch music videos – Listen to music Teacher should use to – Show fun videos – Show.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
WELCOME TO THE AHIA CONNECTED COMMUNITY! HEALTHCARE INTERNAL AUDIT'S PROFESSIONAL THOUGHT LEADERSHIP COMMUNITY.
Making the Most of Your Website A Guide for Business Coaches.
Recent approaches to capture web content, which Heritrix can’t harvest  Capturing Social Media  Screen filming of Rich Media  Project: Event crawl of.
1 Archive-It Training University of Maryland July 12, 2007.
Your Professional Network Powered by NCURA By: Stephanie Moore NCURA Community Curator.
How to make the most of your website: It’s one of your best marketing, branding, awareness tools.
How to Expand Your School’s Online Reach using Facebook, Blogs and Twitter.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Online Marketing Case Study: Diefenbunker. About me Eric Espig, Programs & PR Manager for the Diefenbunker since 09/2009 Ottawa Native Graphic/Exhibit.
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials The internet: Social networks and communities Suitable for: Improver.
Getting Started (The Basics) Copyright 2012 Peoplemovers.com, All rights reserved.
1 Up to the Challenge; Libraries Successfully Serving Job Seekers Presented by Rebecca Mazin May 13, 2011.
Configuring Social Media, Google Analytics, and Gadgets Lila Bronson Training Manager, OmniUpdate, Inc.
Mashups… …Recycling Data. As a simple example…  Click on  Videos that are uploaded individually over time are collected.
Creating an Online Professional Presence Using Social Media.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
Why Use Social Media for Rotary? Peter Borner The Rotary Club of Towcester.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Social Media and Library Outreach Share Resources, Highlight Services, and Build Support John L. Amundsen, Communications Specialist ALA Office for Literacy.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Social Media at LISC June LISC Social Media What is it? New ways to distribute our news and stories that engages, interacts and shares. Why do it?
What is Social Media? And how best to use it.
Imagery 2.0 –you are here and there A brief introduction to social photo and video.
Twitter.com/DOTLebanon facebook.com/DOTLebanon‎ A presentation about social media with emphasis on facebook.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
WebLearn User Group Oct 2013 Dr Adam Marshall WebLearn Team IT Services weblearn.ox.ac.uk.
Foxbright – Smarter Education Websiteswww.foxbright.com Foxbright Training Foxbright Teacher Pages
Getting Started Copyright 2010 Peoplemovers.com, All rights reserved.
1 Video and flash harvesting. 2 Dailymotion, a special crawl Twice a year we crawl Dailymotion. But the model changes all the time… –The seed list contains.
1 Advanced Archive-It Application Training: Crawl Scoping.
Lincolnshire and Rutland Public Service Compact “Using new technologies to deliver Leadership and Management Programmes” The role of social networking.
10 Steps for SEO Success An SEO Guide for Small Businesses
 Definition of Social Media - forms of electronic communication (as Web sites for social networking and microblogging) through which users create online.
Do This file can be found at
Advanced Website Training: June, 2010 Insert Images as Your Background Using Google Docs for Document Hosting Custom Contact Forms on Your Website.
Classical Model: Web Harvesting W/ARC - GET / HTTP/ OK text/css image/gif image/jpg video JavaScript Pull from queue.
1 Advanced Archive-It Application Training: Reviewing Reports and Crawl Scoping.
Presented by: Empower Your Business Using Social Media.
How to Leverage SOCIAL MEDIA in BLENDED LEARNING.
Building a Social Media Presence Participants will look at the BCPS social media outlets (Twitter, Facebook, Flickr, Vimeo, Instagram, blogs) and relevant.
Social Media & Social Networking 101 Canadian Society of Safety Engineering (CSSE)
LITA & Social Media: Using Social Software to Connect with Members Task: Determine the appropriate "social software" functionality to deploy to meet the.
NASBLA Social Media: What is it for? NASBLA is involved in numerous Social Media that all serve a distinct purpose. So, what are they all for?
Digital Marketing Campaign. | What’s included: 1.Search Engine Optimization Plan 2.Social Media Marketing Plan 3.Advert.
13 Social Media and Networking. Introduction Social Media Types of Social Media Benefits and Challenges Measuring Social Media Performance.
Joanne Archer University of Maryland Libraries
Joocial Community Social Auto-Posting for Joomla
Overview Social media applications inform, educate, and entertain people through online (multi-)media A social networking application allows users to create.
Welcome! Please introduce yourself via the chat
New Mexico Broadband Program Internet Tools for Small Business
Presentation transcript:

1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites

2 Agenda Overview of Social Networking/Media sites Why archive these sites? Typical Challenges Best Practices: Twitter, Facebook, YouTube, Flickr Looking toward the future… Questions/Discussion

3 Why Archive These Sites? State Agencies: An increasing number have decided that the content on these sites are a record and need to be archived. "A tweet is a record” University libraries: Used to share information with students and alumni and contain important records about a school's culture, student body and campus events. Non Government Non Profit Organizations: Used to record online presence and impact Researchers: Used to preserve valuable social reactions and change on topics of interest

4 Archive-It and Social Media Overview Capturing Social media sites is becoming more necessary for Archive-It partners Still focused on: Flickr, Facebook, Twitter, and YouTube On our radar: Vimeo, LinkedIn, Others? Join the Archive-It social media list serve to hear breaking news, including fixes and adjustments within Archive-It

5 Social Media Crawling Notes Content behind log-ins can not be archived currently – Feature in 4.8 Release, April 2013 Some parts of sites are not “archive-friendly” (i.e. complicated javascript, etc.) These sites tend to change both their technical structure and policy quickly and often.

6 Scoping Social Media Sites Because of the way many of these sites are structured, scoping crawls correctly is very important if you are archiving these sites. – Each site has its own unique structure – Not scoping correctly can result in crawling much much more than you intend, or not capturing the content you want to archive.

7 Scoping - Overall Approaches Trial and Error: Try to harvest with a variety of settings and a variety of seeds Quality Review: review archived content thoroughly Collaborate: compare approaches and results with other Archive-It users Document detailed instructions, lessons learned, and best practices for other partners

8 Best Practices Best practices for various social networking and social media sites are documented on the Archive-It Help Wiki: Archiving+Social+Networking+Sites+with+Arch ive-It

9 Best Practices Be specific with your seed URLs - list only the page you would like to archive as a seed. Do NOT use the larger site as a seed (for example, do NOT use or as seeds. DO use: Double –check your seed: Do you need an ending slash / ? Ignore Robots.txt as needed: Some sites block content using robots.txt

10 Best Practices ALWAYS run a test crawl when first setting up these seeds to avoid using more of your document budget than expected. You may need to run more than one until you get it right.

11 Best Practices After your first crawl… – Review post-crawl reports (did you crawl too much?) – Review archived content in Wayback Did you capture all the areas you expected? Are there any display issues?

12 Reviewing Scoping Rules To the web app!

13 Twitter – Sample URLs – Individual user feeds – Searches pd – Lists – A specific tweet

14 Twitter - Scoping Expand Scope (using SURTs) to capture dynamically loading content: – Individual Twitter feed: + – Multiple Twitter feeds: +

15 Links in Tweets Can I archive a url linked to using a ‘url shortener’? – Yes! Use an Expand Scope rule for - all URLs posted on Twitter redirect through that domainhttp://t.co/ – Note: just the one page that the url shortener link points to will be archived (plus embedded content)

16 Twitter Examples of Archived Pages

17 Facebook – Sample URLs – Individual User Profiles – Timeline view – Pages - Timeline view – Events – Albums &type=3

18 Facebook - Scoping – Ignoring robots.txt: fbcdn.net akamaihd.net – Document limit on (recommended 2000 for each seed) – Note, you cannot limit to *just* capture content from one Facebook accountwww.facebook.com – Expand Scope: – SURT +

19 Facebook Currently we can capture the initial content on a Facebook timeline, however the dynamically loading content can be difficult to capture due to the frequent changes in the way that content is served by Facebook Our engineers are working on keeping up to date with these changes and we are also investigating alternate methods for capturing Facebook pages

20 Facebook Examples of Archived Pages

21 YouTube - Sample URLs – Channel /User pages – Watch pages- individual videos – Uploaded Document RSS Feed ads/ – Embedded YouTube Videos on other sites: video/video/2013/01/29/president-obama-speaks- comprehensive-immigration-reform

22 YouTube - Scoping For all YouTube content, ignore robots.txt for: – youtube.com – ytimg.com For Watch pages- individual videos – Use “One Page Only” Seed Type For Channel/User pages – Crawl with a document limit or using RSS/News Feed seed type

23 YouTube Viewing YouTube videos: – YouTube videos for Watch pages and most embedded YouTube videos will playback normally in Wayback – For Channel/User Pages or other pages where videos are not playing back within the page, view videos from the video report or the public video page for that seed.

24 YouTube Examples of Archived Pages

25 Flickr What types of pages can be archived? – Photo streams Ex: – Individual photos Ex: in/photostream

26 Flickr Examples of Archived Pages

27 Other Sites Can sites other than those already mentioned be archived? – Yes! There are many more sites out there that can be archived. Please send us sites you are interested in archiving. – Other sites mentioned by partners currently are Google+, LinkedIn, Vimeo, and SlideShare.

28 Moving Forward These best practices will change as the sites themselves make changes. Please be sure to check the Help Wiki page for updates We continue to focus on working with our partners to improve the capture and display of archived social networking sites The Archive-It team is exploring other capture mechanisms besides using a traditional crawler resource (Heritrix) Headless browsers Hybrid architecture API Partnering with third party software Enhance the display and search capabilities

29 Thank you! Questions? Discussion? Please take our quick survey: