Web Archiving: Avery Fisher Center for Music & Media Rhiannon Bettivia, Zack Lischer-Katz, Samantha Losben & Erica Wilson November 29, 2010 Digital Preservation.

Slides:



Advertisements
Similar presentations
Panel: What Changes With Digital? Web Archiving ARL Forum 2009 Tracy Seneca – California Digital Library.
Advertisements

1 Advanced Archive-It Application Training: Quality Assurance October 17, 2013.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
Toulouse School of Graduate Studies Theses and Dissertations ETDs - Why We Do them –We at UNT believe that electronic theses and dissertations enhance.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
University Archives University Archives & Archive-It WebCom
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
How to Establish a Blog. What is a Blog A blog is a collection of informational articles/ideas intended to update a viewer on new information associated.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Microsoft SharePoint 2010 Upgrade Preview FSU SharePoint Users Group Presents: Thursday, December 1 st, 2011.
Born Digital Project of the California Digital Newspaper Collection.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Web site archiving by capturing all unique responses Kent Fitch, Project Computing Pty Ltd Archiving the Web Conference Information Day National Library.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
Copyright Law: Facts and FAQs By Mr. Joel Free Career and Technical Education Troutman Middle School.
Computer Science : Information Systems Design and Development Unit Web Sites - National 4 / 5 St Andrew’s High School-Revised January 2013 Slide 1 St Andrew’s.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
5-7 November 2014 ADLSN - ADLC Practical Digital Content Management from Digital Libraries & Archives Perspective.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
Gathering and Analyzing Web Use Statistics: A Practical Tutorial for Archivists Michael Szajewski, Ball State University, Archivist for Digital Development.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Human Rights Archives and Documentation, CHRDR Conference 4- 6 October 2007 Issues in Human Rights Web Archiving Robert Wolven Columbia University Libraries.
WHS joined Archive-It in the fall of 2010 Began capturing state information with the capture of Governor Jim Doyle’s websites at the end of the administration.
Bayu Priyambadha, S.Kom Teknik Informatika Universitas Brawijaya.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
WAS to Archive-It Metadata Migration March 11, 2015.
Can we be doing more? Beth Tillinghast University of Hawaii at Manoa October 19, 2011 Archive-It Partner Meeting ACCESS TO OUR ARCHIVED WEBSITE COLLECTIONS.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Using Copyrighted Works Do I need permission to use this? Slides produced by the Copyright Education & Consultation Program.
From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.
Web Archiving for Music History : The Archive of Contemporary Composers’ Websites at New York University Kent Underwood & Robin Preiss Music Library Association.
Curator wishes for the roadmap november 2011 updates.
If you build it will they come? The LAIRAH Study: Quantifying the Use of Online Resources in the Arts and Humanities through Statistical Analysis of User.
February 12, 2015 FAIR USE IN THE VISUAL ARTS. Why fair use is important in the visual arts Fair use: the basics Why a code of best practices? What’s.
Copyright Presentation Table of Contents 1.Understanding Copyright Tips for Identifying the License (March 6, 2014: updated from Sept 19, 2013) 2.Applying.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Uncovering the Invisible Web. Back in the day… Students used to research using resources hand-picked by librarians and teachers. These materials were.
“Help, we started a journal!” Adventures in supporting open access publishing using Open Journal Systems Anna Craft Metadata Cataloger The University of.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
One Library’s Successful Venture in Providing Comprehensive Streaming Media Services Charleston Conference 2015 Saturday, November 7 10:45am - 11:15am.
Web Web 1.0 Web 2.0 Yellimar Borges Serrano Edwin Sánchez Bonilla Keyla La Santa Díaz Yellimar Borges Serrano Edwin Sánchez Bonilla Keyla La Santa Díaz.
Libraries of Course: integrating library content and services into the e-learning environment. Brian Flaherty Digital Services Manager University of Auckland.
1 Advanced Archive-It Application Training: Reviewing Reports and Crawl Scoping.
Sarah Beaubien Scholarly Communications Outreach Coordinator Grand Valley State University Libraries Open Education Materials: Collecting,
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
CopyRight or CopyWrong? Fair Use and Faculty Reserves
The Writing Process.
Joanne Archer University of Maryland Libraries
Creating Web Collections with Archive-It
An Ounce of Different is Worth A Pound of Same ~ Sustaining rich collections by adapting what we know & learning skills we need.
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Attributes and Values Describing Entities.
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
The Annual Copyright License User Guide
Presentation transcript:

Web Archiving: Avery Fisher Center for Music & Media Rhiannon Bettivia, Zack Lischer-Katz, Samantha Losben & Erica Wilson November 29, 2010 Digital Preservation

Avery Fisher Center for Music & Media Music Librarian - Kent Underwood

Kent Meeting No. 1  The Avery Fisher Center did not have any internal websites in need of crawling  Suggested 3 types of music websites to pursue: artist sites, commercial sector sites, and music journalism sites  Potential issues included decisions on how to narrow potential subjects, issues of proprietary materials and permission for capturing copyrighted objects, and sites that linked to social networking venues that are password protected

Test Crawl of AFC Website After initial meeting with Kent Underwood: He informed us that he didn’t actually have any internal web projects to be archived. Music library has one small page as part of the general library website The Scope was set to crawl once a day for a month Yielded very little information or information that is worth capturing

First day of crawl Statistics Started October 22, :45:07 PM Completed October 22, :45:37 PM Status Finished Average Doc Rate 2.14 urls/sec Average KB Rate 23.0 KB/s Total Documents Crawled* 60 Total Data Crawled KB New Documents Archived 0 New Data Archived* 0.0 bytes Last Crawl Statistics Started November 25, :16:04 AM Completed November 25, :58:24 AM Status Finished Average Doc Rate 1.34 urls/sec Average KB Rate KB/s Total Documents Crawled* 3,409 Total Data Crawled 1.8 GB New Documents Archived 3,409 New Data Archived* 5.2 Reset scope to once a Week / still nothing of great importance captured Feel that now it could be set to once a year

We thought the project would entail Cursory crawl of the Avery Fisher site Test crawls of 9 artists’ websites recommended by Kent that were chosen as they steered clear of Facebook and MySpace Creation of a permission letter to send to artists

The letter, parts 1,2 and 3 Format came from Columbia University rmissions/requesting-permission/model- forms/#letter4 Adapted it to be about the Internet Archive Edited it for length

Kent Meeting No. 2 Showed him the letter and our test crawls Kent offered to make changes to the letter BOMBSHELL: Kent’s secret ambitions and goals for the project

What the project became The creation of a permission letter that would encompass the scope of the new project Obtaining clearance on letter from Kent, then Howard, then Kent again

Final Letter Dear ________________, The New York University library is working with graduate students in the Digital Preservation course on a pilot project to create an archive of music-related web pages. Your website is of particular interest because of the NYU library’s long established commitment to collecting contemporary music. With your permission, we would like to add your website to our growing collection which consists of personal websites maintained by independent musicians and composers. We are using software developed by the Internet Archive called Archive-It, which works by crawling websites and archiving copies of each page within a domain. Curated groups of archived websites become available as research collections through the Archive-It website: The one and only goal of this initiative is to preserve historical documentation for research. Web crawling and archiving begins by taking a snapshot of a website in its current state; then by repeating the process periodically, it tracks the site’s evolution as it changes over time. We are mindful of your business concerns in an increasingly difficult market, and we want to assure you that our archiving effort is entirely scholarly in nature and that no profit is to be derived from this effort. No commercial use or any changes whatsoever will be made by us to your site. We urge you to look at for information about the Internet Archive, and for specific information about web site archiving. We would also emphasize that none of this would require any effort on your part; we and the software would be doing all the work. The success of this project rests on our ability to crawl your website, but of course we would not want to go ahead without your permission. Once the feasibility of the pilot has been established and all the technical and administrative questions have been settled, the NYU library is prepared to commit to preserving the archive in perpetuity. We are eager to proceed with this project, which we see as fundamental to our scholarly mission as archivists in preserving our digital heritage. Thank you for your consideration. If you have any questions, please feel free to contact us. With best regards, _________________on behalf of Kent and students in the Fall 2010 Digital Preservation class

Seed URLs

Crawl Frequency Avery Fisher Center Site Site changed on: 10/26, 11/4, 11/11, 11/25 Our subject librarian is currently only interested in one-time crawls.

Quality Assessment Avery Fisher Center Site Display Problems

Common File Formats text/html4003 URLs41.1MB image/jpeg3657 URLs232.2MB image/gif747 URLs11.2MB text/xml329 URLs5.6MB application/pdf322 URLs796.0MB audio/mpeg189 URLs1.6GB

Site Complexity and Size Hannahlash.com –33 URLs –126MB Nicomuhly.com –3,191 URLs –120.2MB Nicolascollins.com –1,493 URLs –1.7GB

Permissions The html code can give us clues about how the content creator feels about crawling: –Seanshepherd.com = “no robots!” –Hannahlash.com = “yes robots!” –

Sustainability Issues Server side behavior is difficult to archive: –Avery Fisher Site –NYU-Bobst CMS = OmniUpdate? –NYU-TSOA CMS = iOn Other difficult formats to archive –Flash –Javascript

Recommendations Letter as Template Partnership with Internet Archive Expedient need for archiving - use ARCHIVE-IT