Panel on Web Archiving Government Information: LAC’s Program Update

Slides:



Advertisements
Similar presentations
K-12 Web Archiving Project Archive-It Partner Meeting November 4, 2009.
Advertisements

Publication of Proposed Regulations and the Consultation of Citizens The experience of the Government of Canadas official newspaper Rémi Massé Director,
New Developments in Library and Archives Canadas ETD Program 11 th International Symposium on ETDs Aberdeen, Scotland, June 5, 2008 Sharon Reeves, Manager,
Web Harvesting Collaborations at Library and Archives Canada Tom Smyth Manager, Digital Capacity IIPC GA 2014.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Regina 2014 GALLOP Portal Status Update, Future Plans Greg Salmers Saskatchewan Legislative Library A Library & Research Session.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
Latin American and Human Rights Web Archiving as part of Research Library Special Collections Kent Norsworthy LLILAS Benson Digital Curation Coordinator,
The Structure of Canada’s Federal Political System
Information Without Borders: Perspectives from the Federal Government: A Canadian Digital Information Strategy Ingrid Parent Library and Archives Canada.
AMICUS and the National Union Catalogue Transition to OCLC Briefing NUC User Round Table.
1 Co-developing access to the UK Web Archive Helen Hockx-Yu Head of Web Archiving, British Library.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
NATIONAL OFFICE INITIATIVES Supporting the Strategic Plan June 20, 2015.
OU Digital Library development project Liz Mallett – Project Manager James Alexander – Project Developer 25 January 2012.
Preserving the Unpreservable: Form, Content, Copyright and the Archiving of Born-Digital Newspapers Lisa Lynch Concordia University Paul Fontaine McGill.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Canada’s Drug Strategy. 2 Purpose Provide an overview of Canada’s renewed National Drug Strategy  Historical context  Impetus for change  Renewed National.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management.
Researching Canadian Legislation Michelle Louise Atkin Subject Specialist for Law and Human Rights Frances Montgomery Documents Collection Specialist:
Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
0 GPO’s Federal Digital System October 21, 2008 Selene Dalecky – Lisa LaPlant – Blake Edwards U.S. Government Printing Office.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
PBS.com Integration Business Questions January 17, 2014.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
The Web Archiving Service Spring 2009 Update User’s Council Annual Meeting Tracy Seneca California Digital Library Capture Today’s Web;
+ Seizing the Potential of Diversity in Workplaces Michael J. Prince Keynote Remarks to International Day of Persons with Disabilities National Panel Conference.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
Serving the Public. Regulating the Profession. CANADA’S ANTI-SPAM LEGISLATION (CASL) Training for Chapters Based on Guidelines for Chapters First published.
NATIONAL ASSEMBLY BUDGET AND RESEARCH OFFICE (NABRO) NIGERIA A PRESENTATION AT THE GLOBAL NETWORK OF PARLIAMENTARY BUDGET OFFICES COMMUNITY MEETING “SEMMINAR.
Management of the Fiscal Framework June Managing the Fiscal Framework All about fiscal planning / budgeting Projecting total revenues, expenses.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Federal Regulations Federal regulations are the third primary source of American law discussed. Proposed regulations and final regulations are published.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
USW is Working to End Violence against Women and Girls.
Archiving & Preserving Digital Content
CMNS 261 Finding Public Policy Documents
Town Hall on Arts and Culture in Parkdale-High Park
The Right to Know An Overview of Library and Archives Canada’s Role in Truth and Reconciliation Johanna Smith Director General, Public Services Branch.
7.2- The President’s Job Civics & Economics.
Parliament and the National Budget Process
Workshop on Web Archiving
Canada’s Federal and Provincial Governments
A Closer Look at Three Useful Web Sites
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
The Australian Government Web Archive
2017 Conference of the Queen’s Printers Association of Canada
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Engagement on the Comprehensive Funding Agreement
Canadian Injustices Historical and Contemporary injustices that challenge the "inclusive" narrative of Canada.
Web UTL: Gaps, traps & opportunities
Government Information Day November , Toronto ON
Federal Parliamentary Pre-Budget Consultations in Canada:
PUBLIC SCHOOL LAW Part 9: Primary Legal Sources: The Constitution
Long-Lived Data Collections
Stakeholder Engagement: Webinar Part I: The Regulatory Development Process for the Government of Canada Part II: Making Technical Regulations Under.
Presentation to workshop
CDM – COPD Billing.
Expenditure Management
2016 Queen’s Printers Association of Canada Conference
Top ten things you need to know
Jobs and Skills in the Local Economy
Presentation transcript:

Panel on Web Archiving Government Information: LAC’s Program Update Government Information Day 2018 Tom J. Smyth Manager, Digital Integration Digital Preservation and Migration Division tom.smyth@canada.ca @smythbound

Five Main Web Archiving Activities Domain crawl of the federal web presence Curation of thematic and social media research collections Reactionary, events-based collections Preservation archiving of resources at risk (e.g., TBS WRI) Acquisitions linked to existing library collections or archival fonds

Thematic Research Collections Political Collections 142 seeds, including documenting cabinet shuffles and recurring collection of all federal political party websites Collection of #cdnpoli and #canpoli using Twarc Started ongoing collection on 29th January 2018 7.1+ million tweets collected, and about ~58 GB of data Tweet capture rate: Over 700,000K per month (23,000+ per day)

Curation of Thematic Research Collections Legalization of Cannabis in Canada 212 seeds, broad coverage of the media Changes to legislation and frameworks (federal and provincial, includes bills and parliamentary and legislative assembly debates where possible) Economic Impacts (job creation, expected revenue, black markets) Health (public and mental health) Advocacy (e.g., advocacy supporting small growers, legal pardons for previous possession offenders) Business and retail (including provincial retail sites & corporate sites) From 10-22nd October 2018, we collected 48k tweets from the #legalizationday hash via Twarc

Curation of Thematic Research Collections First World War Commemoration 1000 seeds, with emphasis on regimental and soldier histories; commemoration activities; documenting major battles in which Canadians took part From March to April 2017, we collected 122k tweets from the #vimy100 hash via Twarc 150th Anniversary of Canadian Confederation 1,640 seeds on the official themes of Diversity and Inclusion, Engaging and Inspiring Youth, Indigenous Reconciliation, and the Environment From May to December 2017, we collected 7.87 million tweets from the #canada150 hash via Twarc

Curation of Thematic Research Collections (2) National Inquiry into Missing and Murdered Indigenous Women and Girls 111 seeds, focusing on advocacy and resources addressing violence against Indigenous women; research and data (on the scope, nature, and underlying causes of the problem); commentary on the national inquiry itself Official Publications Seedlist (ask me about it!) 124 seeds, all the critical documentation: Publications.gc.ca, PARL.gc.ca, all courts and their decisions, all PCO’s and TBS’ resources, Budget, Gazette, the GG, Elections, Officers of Parliament, Ombudspersons, Stats Can resources, Canada.ca – and a partridge in a pear tree Seeking to delta crawl these seeds on a regular schedule

Preservation Web Archiving Web Renewal ended in December 2017, but: Specialized crawls of Canada.ca and its subdomains continue (they’re complicated) 16.5 terabytes of federal web content collected in TBS Web Renewal contexts since 2014 ~2.5 terabytes of which was collected this fiscal Delta crawls of domains still in *.gc.ca, looking at RSS feeds to capture updates 1,730 seeds / resources to date (362 this fiscal)

Full-Text Search is Coming! Full-text search is running internally! (beta) Search by: Keyword (“Tom’s Federal Budget”) Resource type (e.g., PDF) By domain (e.g., publications.gc.ca) Working on including date range, and when done search will be publicly launched You’ll be able to search back to 2005!

~48 TB of Web Archives since 2005

Tom J. Smyth Web Archiving @ LAC Manager, Digital Integration and Program Lead, Web Archiving tom.smyth@canada.ca @smythbound Questions most welcome!