Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of Email using ePADD Software.

Slides:



Advertisements
Similar presentations
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Internetteknologi (ITNET1) Extra Presentation Java IDE Tool Support.
Roadmap Agenda  iET Process Analyzer 3.0  iET CMDB Intelligence & Discovery 2.1  iET Enterprise 12  iET ITSM 6  iET Mobile ITSM for.
Enterprise Content Management Departmental Solutions Enterprisewide Document/Content Management at half the cost of competitive systems ImageSite is:
29 Oded Moshe, Director of Product Management Beta Release May 3rd, 2010 Official Release May 24, 2010.
© Copyright 2012 STI INNSBRUCK Apache Stanbol.
Digital Libraries: Study into the features of the DSpace Suite Devika P. Madalli Documentation Research and Training Centre Indian Statistical Institute.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
Introduction Rich Internet Applications OpenLaszlo as an RIA Examples Community Competitors OpenLaszlo Architecture OpenLaszlo XML Structure Dealing with.
J2EE Java 2 Enterprise Edition. Relevant Topics in The Java Tutorial Topic Web Page JDBC orial/jdbc
Web Applications Basics. Introduction to Web Web features Clent/Server HTTP HyperText Markup Language URL addresses Web server - a computer program that.
ANDROID PROGRAMMING MODULE 1 – GETTING STARTED
Introduction to Apache Tika CSCI 572: Information Retrieval and Search Engines Summer 2010.
ECPRD seminar on the net IX”, Brussels, 2011 Faceted Search Some examples of applied faceted search on websites developed by the EP Jerry.
ViciDocs for BPO Companies Creating Info repositories from documents.
HTML5. What is HTML5? HTML5 will be the new standard for HTML. HTML5 is the next generation of HTML. HTML5 is still a work in progress. However, the major.
Computer Concepts 2014 Chapter 7 The Web and .
RUG Australia meeting 2012 Feb 6, V Tiers & sequencing suppliers Tiers and sequencing and load balancing  Tiers = groups of suppliers.
Tool Academy: Web Archiving Nicholas Digital Cultural Heritage DC Meetup December 20, 2012 “cobwebbed screw driver” by Flickr user Colby.
9/10/20151 Hyperion Enterprise 6.5 New Features & Functionality Robert Cybulski, CPA Finit Solutions.
Sumedha Rubasinghe October,2009 Introduction to Programming Tools.
Wien, January Infrastructure for Spatial Information in the European Community The INSPIRE Community Geoportal EC INSPIRE GEOPORTAL TEAM European.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
AIRBUS LMS Customer Services 1 AIRBUS LMS C.B.T. Software Design - STDS2 Copyright © AIRBUS SAS 2003.
Survey of Semantic Annotation Platforms
09/04/2008 Wallops Institutional Information Management System WIIMS An Overview.
WP3 System Architecture & System Integration By (Stein) Runar Bergheim Asplan Viak Internet.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
Plenary meeting 2015 – Chania - Crete CASCADE Data Services Yusuf Yigini, Panos Panagos, Martha B. Dunbar Joint Research Centre - European Commission.
Z-Geoinfo Inc. Capability Briefing June 21, 2011.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Nadir Saghar, Tony Pan, Ashish Sharma REST for Data Services.
Fisheries Oceanography Collaboration Software Donald Denbo NOAA/PMEL-UW/JISAO Presented by Nancy Soreide NOAA/PMEL AMS 2002/IIPS 10.3.
The Prajna Project Utilities for Understanding Edward Swing.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Eclipse 24-Apr-17.
The HTTP is a standard that all Web browsers and Web servers must speak in order for the Web portion of the Internet to work.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Running Kuali: A Technical Perspective Ailish Byrne (Indiana University) Jonathan Keller (University of California, Davis)
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
GOSS iCM Gary Ratcliffe. 2 Agenda Webinar Programme V10 Overview Version Information Supported Browsers Architectural Changes New Features.
DSpace - Digital Library Software
ICM – API Server & Forms Gary Ratcliffe.
 Automating the process of writing the automation code using Allay Test Tool.  Allay Test Tool generates test files in executable/running form.  Dev/Testers.
ICM – API Server Gary Ratcliffe. 2 Agenda Webinar Programme API Server Overview JSON-RPC iCM API Service API Server and Forms New services under.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
® IBM Software Group © 2007 IBM Corporation Module 1: Getting Started with Rational Software Architect Essentials of Modeling with IBM Rational Software.
Ex Libris, LOD and BIBFRAME
SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Google Web Toolkit for Mobile Applications Development INGENUITY AT ITS BEST……………….
Krum Haesli, BotsBits SA Implementing Digital Asset Management with SharePoint 2013.
X2R Spec 1. Change log DateVersionPeopleNote 2013/11/01V0.0.1Chien-Wei Yu, Anderson Ou First draft, add X2R files spec. 2013/12/16V0.0.2Anderson Ou, Doc.
1 ECM APPLICATIONS AND SOLUTIONS - PART 1 MODULE 8 ECM SPECIALIST COURSE 1 Copyright AIIM.
Institutional Repository for Milligan College. Introduction.
Archival Stewardship of using ePADD Glynn Edwards Stanford University Libraries March 2, 2016.
1 ODF and Web Mashups Basic techniques Rob Weir, IBM :15.
Python Programming Unit -1.
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
VI-SEEM Data Discovery Service
Platform as a Service.
FileSpot Collaborative File Manager
PHP / MySQL Introduction
Introduction to DSpace
Appraising and Processing of Potential Historical Value with ePADD SAA Research Forum – 15 August
Google App Engine Ying Zou 01/24/2016.
Code Analysis, Repository and Modelling for e-Neuroscience
Presentation transcript:

Glynn Edwards SAA – August 22, 2015 Director, ePADD Project Archival Stewardship of using ePADD Software

Developed and funded by:

ePADD program

Appraisal Module

ePADD Technical Information ePADD is written in Java and Javascript and powered by Apache Tomcat (v7.0) using Java EE Servlet API (v3.x) and Java Mail (v1.4.2). Text and metadata extraction, indexing and retrieval is performed by Apache Lucene (v4.7) and Apache Tika (v1.8). Charting and visualization is supported using the D3 ‑ based reusable chart library (v0.4.10). Oracle's Java Application Bundler and Launch4J are used for packaging on Mac and Windows platforms respectively. Other Java libraries from Apache (Lang, commons, CLI, IO, logging, etc.) are also used. JSON formatting is performed with the libraries org.json and Gson. ePADD has implemented its own natural language processing (NLP) toolkit which is used for named entity extraction, disambiguation and other tasks. This toolkit supplants the Apache OpenNLP used in earlier beta versions of the ePADD software. We continue to use Muse as an internal library within ePADD. However, the Apache OpenNLP proved insufficient for our needs (at least for name recognition), and after various rounds of customization, we built our own named entity recognizer. This toolkit uses external datasets such as Wikipedia/DBpedia, Freebase, Geonames, OCLC FAST and LC Subject Headings/LC Name Authority File. The project is developed with IDEs like IntelliJ Idea and Eclipse, built with Apache Maven, Ant, and custom shell scripts, and tracked using Git for source control and issue tracking. The ePADD software client is browser ‑ based and compatible with Chrome and Firefox. It is optimized for Windows 7 and OSX 10.9/10.10 machines, using Java 7 or 8.

Correspondents: Resolving multiple accounts into single entry

Actions: do not transfer – restrict - reviewed

Processing Module

Disambiguatio n of names

Discovery & Delivery (Access)

Query generator

Upload of CSV files of addresses for matching with existing archive Search by Date and Date Range 1.1 release - August 2015 New features

Future Roadmap Enhance Natural Language Processing Capability Enhance the Processing Module Features Enhance the Discovery/ Delivery Module Features Recommend and Test Preservation Strategy Collaboration with other Platforms & Services Explore Sustainability Model Add Restriction Management/ Annotation Functions Enhance the Error Handling Capability

ts/epadd Glynn Edwards Peter Chan Josh Schneider d/collections