Download presentation
Presentation is loading. Please wait.
Published byMarshall Atkins Modified over 8 years ago
1
Richard Marciano Professor, University of Maryland iSchool Affiliate Professor, Computer Science Director, Digital Curation Innovation Center (DCIC) University of Maryland Bill Underwood Research Faculty, University of Maryland iSchool Digital Curation Innovation Center (DCIC) University of Maryland http://dcic.umd.edu Pop-Up Session #311: “Archival Records in the Age of Big Data” Friday, August 5 – 9:30 a.m. – 10:45 a.m. 1 ARCHIVES*RECORDS COSA & SAA 2016 annual conference
2
Archival Research Issues: How to manage the speed of growth? How to archive? What to archive? How to access over-abundance?
3
Census 1940 Products Maps, Enumeration District Descriptions, Schedules (per Martin Jacobson @ NARA): 3.25 million images: TIFF file format for master files 300 PPI at original size 8-bit grayscale for legibility JPEG2000 chosen as access file format, allowing end users to zoom and pan Typical uncompressed TIFF 300 ppi image is 38.7MB large. So 3.25 million images result in 125.8 Terabytes. … Birmingham, FrancisSo n Moth er
4
1 GB / directory / NC city / year 50 cities 50GB 50 years 250TB Historical City Directories (Internet Archives)
5
Computational social science: The application of computer science and big data techniques to social science research. E.g.: - Social network analysis - Crowds - Markets - Political discourse Computational journalism: Nick Diakopoulos (UMD) Finding and telling news stories, with, by, or about algorithms (Praxis about integrating data, modeling, simulation, programming into journalistic norms, goals, and epistemology) Conceptualization and application of computational and data-driven approaches to journalism practice. Methods from text analysis, social computing, automated news production, simulation / prediction / modeling, algorithmic accountability, and content analytics are applied to real journalistic scenarios. Examples: 1.Automated Writing Pipeline (how to write a bot) 2.Panama Papers 5
6
1. Automated Writing Pipeline
7
2. Panama Papers Power of big data analysis in April 2016 (https://panamapapers.icij.org/):https://panamapapers.icij.org/ 2.6TB & 11.5M documents of leaked financial data on offshore accounts for some of the world’s highest public officials. 370 investigative journalists in 100 media organizations and 76 countries worked together for one year.
8
TWO DCIC WORKSHOPS: a.Recent: “COMPUTATIONAL ARCHIVAL SCIENCE” symposium April 2016 http://dcicblog.umd.edu/cas/ (1) Archives / museums / libraries, (2) Crowds, citizens, and communities, (3) digital methods DCIC (Marciano/Kurtz/Underwood), KCL (Hedges/Blanke), TACC (Esteva), UBC (Lemieux) =================================================================== An interdisciplinary field concerned with the application of computational methods and resources to records/archives processing, analysis, storage, long-term preservation, and access, with the aim of improving efficiency, productivity, and precision in support of appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival materials. This suggests that computational archival science is also a blend of computational thinking with archival thinking.
9
b. Upcoming: “Digital Records in the Age of Big Data” http://dcicblog.umd.edu/cas/ieee_big_data_2016_cas-workshop/ http://dcicblog.umd.edu/cas/ieee_big_data_2016_cas-workshop/ 2016 IEEE International Conference on Big Data (IEEE Big Data 2016) : http://cci.drexel.edu/bigdata/bigdata2016/ http://cci.drexel.edu/bigdata/bigdata2016/ Workshop in DC, Dec. 8, 2016 (Oct. 3 paper due date) Analytics in support of archival processing, including appraisal, arrangement and description. Scalable services for archives, including identification, preservation, metadata generation, integrity checking, normalization, reconciliation, linked data, entity extraction, anonymization and reduction. New forms of archives, including Web, social media, audiovisual archives, and blockchain. Cyber-infrastructures for archive-based research and for development and hosting of collections Big data and archival theory and practice Digital curation and preservation Crowdsourcing and archives Big data and the construction of memory and identity Specific big data technologies (e.g. NoSQL databases) and their applications Corpora and reference collections of big archival data Linked data and archives Big data and provenance Constructing big data research objects from archives
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.