Richard Marciano Professor, University of Maryland iSchool Affiliate Professor, Computer Science Director, Digital Curation Innovation Center (DCIC) University.

Slides:



Advertisements
Similar presentations
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Advertisements

E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
OCLC Digital Archive: Creating Long Term Access to Digital Masters Roberta Gebhardt, Montana Historical Society Research Center Sarah McHugh, Montana State.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Latin American and Human Rights Web Archiving as part of Research Library Special Collections Kent Norsworthy LLILAS Benson Digital Curation Coordinator,
Converging parallel universes Library services as building blocks of digital humanities research 42nd LIBER Annual Conference Munich June 2013 Gregor Horstkemper.
Peter Granda Archival Assistant Director / ICPSR and the Gerald R. Ford Presidential Library: Two Decades of Collaboration.
Search Engines and Information Retrieval
ARCHIVING DATA Research Data Management. Archive - a place where public records or other historical documents are kept. An extensive record or collection.
Basic Electronic Records Management
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
July 9, National Software Reference Library Douglas White Information Technology Laboratory July 2004.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
DSpace, CyberCemeteries and Other Active Sites for Community Networking Records Maria Esteva and Sue Soy School of Information, UT Austin Austin History.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Search Engines and Information Retrieval Chapter 1.
The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management.
Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Information Technologies for Presentation of Bulgarian Folk Songs with Music, Notes and Text in a Digital Library Lozanka Peycheva, Nikolay Kirov, Maria.
From Concept to Reality: An overview of the University of Wisconsin Digital Collections Melissa Mclimans.
This IMLS-funded project builds on the success of a program already in place at GSLIS, the Data Curation Education Program (DCEP), a concentration within.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Digitizing Photographs For Sustainable Heritage Workshop, June 12-15, 2014 By Steven Bingo Project Archivist, Washington State University.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
From Your Archive to the Web: Managing the Project The digitization of the Historic Photograph Collection of the Public Library of Brookline Digital Commonwealth/
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
CyberInfrastructure for Network Analysis Importance of, contributions by network analysis Transformation of NA Support needed for NA.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
Digital library of Spanish old newspapers and magazines National Library of Spain.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Martin Jacobson Director, Special Media Preservation Division.
Integrating Data Mining and Data Management Technologies for Scholarly Inquiry Ray R. Larson University of California, Berkeley Paul Watry Richard Marciano.
Geospatial Data Appraisal NDIIPP Meeting Presented by Brett Abrams, Archivist June, 2012.
Digital Preservation What, Why, and How? Dan Albertson’s Digital Libraries Class April 13, 2016 Jody DeRidder Head, Metadata & Digital Services University.
The world’s libraries. Connected. The Benefits of CONTENTdm Hosting Services OCLC’s Digital Lifecycle Webinar Series April 9, 2013.
Mission: Be a leader in the digital curation research and education fields, and foster interdisciplinary partnerships using Big Records and Archival Analytics.
Joint Meeting of CSUL Committees,
Basic Electronic Records Management
Creighton Barrett Dalhousie University Archives
Ingest and Dissemination with DAITSS
Jarek Nabrzyski Director, Center for Research Computing
DAITSS: Dark Archive in the Sunshine State
Mukurtu CMS Review, Enriching DH Items
Digital Preservation in Mobile Networks
Campus Cyberinfrastructure
Algorithms for Big Data Delivery over the Internet of Things
Multimedia Training Kit
The Australian Government Web Archive
#ComputationalArchivalScience
Márton Németh – László Drótos How to catalogue a web archive?
Search and Retrieval in a Virtual World
Long-Lived Data Collections
Jonathan Griffin, Managing Director, IFIS Publishing &
Economy (data) in new context BDE SC6 workshop on 5.12 in Cologne
The Citywide open data public records tracker
Presentation transcript:

Richard Marciano Professor, University of Maryland iSchool Affiliate Professor, Computer Science Director, Digital Curation Innovation Center (DCIC) University of Maryland Bill Underwood Research Faculty, University of Maryland iSchool Digital Curation Innovation Center (DCIC) University of Maryland Pop-Up Session #311: “Archival Records in the Age of Big Data” Friday, August 5 – 9:30 a.m. – 10:45 a.m. 1 ARCHIVES*RECORDS COSA & SAA 2016 annual conference

Archival Research Issues: How to manage the speed of growth? How to archive? What to archive? How to access over-abundance?

Census 1940 Products Maps, Enumeration District Descriptions, Schedules (per Martin NARA): 3.25 million images: TIFF file format for master files 300 PPI at original size 8-bit grayscale for legibility JPEG2000 chosen as access file format, allowing end users to zoom and pan Typical uncompressed TIFF 300 ppi image is 38.7MB large. So 3.25 million images result in Terabytes. … Birmingham, FrancisSo n Moth er

1 GB / directory / NC city / year 50 cities  50GB 50 years  250TB Historical City Directories (Internet Archives)

Computational social science: The application of computer science and big data techniques to social science research. E.g.: - Social network analysis - Crowds - Markets - Political discourse Computational journalism: Nick Diakopoulos (UMD) Finding and telling news stories, with, by, or about algorithms (Praxis about integrating data, modeling, simulation, programming into journalistic norms, goals, and epistemology) Conceptualization and application of computational and data-driven approaches to journalism practice. Methods from text analysis, social computing, automated news production, simulation / prediction / modeling, algorithmic accountability, and content analytics are applied to real journalistic scenarios. Examples: 1.Automated Writing Pipeline (how to write a bot) 2.Panama Papers 5

1. Automated Writing Pipeline

2. Panama Papers Power of big data analysis in April 2016 ( 2.6TB & 11.5M documents of leaked financial data on offshore accounts for some of the world’s highest public officials. 370 investigative journalists in 100 media organizations and 76 countries worked together for one year.

TWO DCIC WORKSHOPS: a.Recent: “COMPUTATIONAL ARCHIVAL SCIENCE” symposium April (1) Archives / museums / libraries, (2) Crowds, citizens, and communities, (3) digital methods DCIC (Marciano/Kurtz/Underwood), KCL (Hedges/Blanke), TACC (Esteva), UBC (Lemieux) =================================================================== An interdisciplinary field concerned with the application of computational methods and resources to records/archives processing, analysis, storage, long-term preservation, and access, with the aim of improving efficiency, productivity, and precision in support of appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival materials. This suggests that computational archival science is also a blend of computational thinking with archival thinking.

b. Upcoming: “Digital Records in the Age of Big Data” IEEE International Conference on Big Data (IEEE Big Data 2016) : Workshop in DC, Dec. 8, 2016 (Oct. 3 paper due date) Analytics in support of archival processing, including appraisal, arrangement and description. Scalable services for archives, including identification, preservation, metadata generation, integrity checking, normalization, reconciliation, linked data, entity extraction, anonymization and reduction. New forms of archives, including Web, social media, audiovisual archives, and blockchain. Cyber-infrastructures for archive-based research and for development and hosting of collections Big data and archival theory and practice Digital curation and preservation Crowdsourcing and archives Big data and the construction of memory and identity Specific big data technologies (e.g. NoSQL databases) and their applications Corpora and reference collections of big archival data Linked data and archives Big data and provenance Constructing big data research objects from archives