HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.

Slides:



Advertisements
Similar presentations
Almaden Research Center © 2006 IBM Corporation IOP 06 Open Source Intelligence Lesson Learned.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
1 Integrating user environments and data liquidity to improve the research experience.
Moving Forward With Digital Preservation at the Library of Congress Laura Campbell Associate Librarian for Strategic Initiatives Library of Congress.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Tools for Unstructured Text
The Alliance for Data Archive Technologies: Looking towards a Common Future Myron Gutmann, ICPSR Ben Evans, ASSDA Deborah Mitchell, ASSDA Kevin Schürer,
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Microsoft Academic Search: An Overview and Future Directions Lee Dirks Director, Portfolio Strategy Microsoft Research Connections Developing Data Attribution.
University of Illinois OCR Workshop Loretta Auvil UIUC October 18, 2011.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil.
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Overview of Search Engines
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
The Hathi Trust Research Center and tool builders John Unsworth (with Beth Plale, Scott Poole, Robert McDonald, and others) Project Bamboo Corpora Space.
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
Digital Library Architecture and Technology
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
January, 23, 2006 Ilkay Altintas
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
DuraCloud A service provided by Sandy Payette and Michele Kimpton.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
SEASR Applications and Future Work National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Interoperability through Library APIs Library Technology Services Open House 7/30/15.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
Tools and Deployment University of Illinois at Urbana-Champaign.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Visualizations, Mashups and Dashboards University of Illinois at Urbana-Champaign.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
MENDELEY INSTITUTIONAL EDITION POWERED BY SWETS Tolga Bakkaloglu, Electronic Product Manager, Swets 21 st Panhellenic Conference of Academic Libraries,
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Deployment of Flows Loretta Auvil
SEASR & Meandre for Second Generation Digital Libraries
Packaging Specification Package Ingest Service
An Overview of Data-PASS Shared Catalog
SEASR Overview Loretta Auvil, Boris Capitanu
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
University of Illinois at Urbana-Champaign
DPubS: An Open Source Electronic Publishing System
Dataverse for citing and sharing research data
AI Discovery Template IBM Cloud Architecture Center
Presentation transcript:

HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010

Introduction Digital Repository – Initial focus on digitized book and journal content – Light archive Collections and Collaboration – Comprehensive collection – Shared strategies – Local services – Public Good

Content Distribution 6,173,575 – Total 1,177,667 – Public Domain * As of June 15, 2010

Language Distribution (1) * As of June 15, 2010

Language Distribution (2) The next 40 languages make up ~13% of total * As of June 15, 2010

Originating Institution * As of June 15, 2010

Content over time * As of June 15, 2010

Content Growth

Data Distribution & APIs OAI-PMH Metadata files Bibliographic API Data API

Extended Services Community Development Environment Non-Google Ingest Non-Book/Non-Journal Ingest Computational Research

Strategies for Computational Research Data distribution Protocol-based access Research Center

SEASR Architecture Components Virtualization Infrastructure Meandre Infrastructure Visualization Component Repository Component Discovery Meandre Data-Intensive Flows Apps Services Plugins Web Apps Analytics Data Developer Tools Repositories Data Analysis Components Flows Repositories Data Analysis Components Flows User Interfaces Cloud Computing Visualizations Meandre Workbench

Work – Tag Cloud Count tokens Filter options supported Stem words

Work – Entity Mash-up Entity Extraction with OpenNLP or Stanford NER Locations viewed on Google Map Dates viewed on Simile Timeline

Work – Entities To Network Identify entities Define relationships between entities within same sentence

Work – Text Clustering Clustering of Text by token counts Filtering options for stop words, Part of Speech Dendogram Visualization

Work – Audio Analysis NEMA: Executes a SEASR flow for each run –Loads audio data –Extracts features for every 10 sec moving window of audio –Loads and applies the models –Sends results back to the WebUI NESTER: Annotation of Audio via Spectral Analysis

Work – Zotero Plugin to Firefox Zotero manages the collection Launch SEASR Analytics –Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR –Zotero Export to Fedora through SEASR –Saves results from SEASR Analytics to a Collection Launch MONK Processing –MONK DB Ingestion WorkflowMONK DB Ingestion Workflow

Work – Emotion Tracking Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)

Sentiment Analysis: Visualization

Person Extraction: Scott's Waverley, Ivanhoe, and The Heart of Midlothian.

Location Extraction: Top: Walter Scott's WaverleyBottom: Maria Edgeworth's Castle Rackrent

Thank you!