Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute.

Slides:



Advertisements
Similar presentations
File Format Identification and Archival Processing
Advertisements

William Underwood Georgia Tech Research Institute Atlanta, Georgia
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Rosa Di Cesare, Roberta Ruggieri, Loredana Cerbara, Daniela Luzi Consiglio Nazionale delle Ricerche, Istituto di Ricerche sulla Popolazione e le Politiche.
GEOSS Data Sharing Principles. GEOSS 10-Year Implementation Plan 5.4 Data Sharing The societal benefits of Earth observations cannot be achieved without.
HIPAA Privacy Practices. Notice A copy of the current DMH Notice must be posted at each service site where persons seeking DMH services will be able to.
FOIA Exemption 1 & E.O Classified National Security Information
Presenters: Maureen Chalmers (NWCC) and Terry Delaney(TRCC)
Develop and Validate Minimum Core Criteria and Competencies for AgrAbility Program Staff Bill Field, Ed.D., Professor National AgrAbility Project Director.
Teacher Certification Assessor Information Session November 27, 2014.
Slide 01 (of 22)Title 26/04/2010 Version 1.0 GUIDE to ‘SIMPLE’ Mouse click to continue AN OVERVIEW OF BT’s CONVEYACE INVOICE RECONCILIATION ASSISTANCE.
Verification of Eligibility for Public Benefit Technical College System of Georgia Office of Adult Education FY15.
George W. Bush Presidential Library Electronic Records Alan Lowe April 24, 2012.
The National Declassification Center Releasing All We Can, Protecting What We Must Public Interest Declassification Board NDC Project Update April 22,
Information Governance and the Presidential Memo on Managing Government Records: Converging Issues and the Search for New Ideas Presidential Memorandum:
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Office of the Secretary of Defense (OSD) Declassification Program Presentation to the Public Interest Declassification Board Mr. John Krysa Chief, Directives.
FRBR: Functional Requirements for Bibliographic Records it is the Final Report of the IFLA Study Group on the Functional Requirements for Bibliographic.
Article Database Tutorial (and quick guide to library resources)
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Internal Auditing and Outsourcing
Verification Visit by the Office of Special Education Programs (OSEP) September 27-29, 2010.
WuArchivalContr.ppt-1 Information Technology & Telecommunications Laboratory Presidential Electronic Records Pilot Operating System (PERPOS) William Underwood.
Evolution of a Prototype Archival System for Preserving & Reviewing Electronic Records 2008 SAA Annual Meeting August 30, 2008.
Information Extraction From Medical Records by Alexander Barsky.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Outcome Based Evaluation for Digital Library Projects and Services
ITTL.ppt-1 Information Technology & Telecommunications Laboratory Document Type Recognition and Content Summarization William Underwood Persistent Archives.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Presidential Memorandum on Managing Government Records Paul Wester Chief Records Officer for the U.S. Government National Archives and Records Administration.
ARCHIVISTS’ TOOLKIT WORKSHOP March 13, 2008 Christine de Catanzaro Jody Thompson.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
ITTL.ppt-1 Information Technology & Telecommunications Laboratory Semantic Technologies Applied to FOIA Review William Underwood Partnerships in Innovation:
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
Designing a Corporate Records Management Portal for NARA Kristin Burneston Greg P. Johnson Eric Stoykovich Charlotte Sturm.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Legacy Records Programme Update on the Legacy Records Programme Auckland Government Recordkeeping Forum 17/11/2009 Cheryl Pointon, Acting Manager Appraisal.
What is Mandatory Declassification Review (MDR)? MDR is a means by which any individual, to include members of the public, can request any agency to review.
MedKAT Medical Knowledge Analysis Tool December 2009.
When Can You Redact Information Without Requesting an Attorney General Decision? Karen Hattaway Assistant Attorney General Open Records Division Views.
Database Objective Demonstrate basic database concepts and functions.
Federal Bureau of Investigation Executive Order (As Amended) Declassification Plan Presentation to the Public Interest Declassification Board November.
15 The Research Report.
Special Education Federal Child Count Reporting NOVEMBER 2015.
The School Portal and New and Improved IFAP Tools for Our Partners Today’s Focus: What is a Portal? (general definitions) What is the School Portal? How.
DOE Data Management Plan Requirements
IFAP and Schools Portal: Tips, Tricks and Techniques. Marcello Rojtman Presenter Session 37B.
~ pertemuan 4 ~ Oleh: Ir. Abdul Hayat, MTI 20-Mar-2009 [Abdul Hayat, [4]Project Integration Management, Semester Genap 2008/2009] 1 PROJECT INTEGRATION.
EAD 101: An Introduction to Encoded Archival Description XML and the Encoded Archival Description: Providing Access to Collections Oregon Library Association.
Launching E-Records with a PERPOS: The Presidential Electronic Records PilOt System 2005 NAGARA Annual Meeting.
Public Libraries Survey Data File Overview. 2 What We’ll Talk About PLS: Public Library Survey State level data Public library data (Administrative Entities)
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Audit Trail LIS 4776 Advanced Health Informatics Week 14
Creighton Barrett Dalhousie University Archives
Grammar-based Specification and Parsing for Binary File Formats
Content-level intellectual control for digital archives
How to Publish with IEEE
An Introduction to Public Records Office of the General Counsel
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Automatic Detection of Causal Relations for Question Answering
Database Fundamentals
The ultimate in data organization
BCS Template Presentation February 22, 2018
GSBPM AND ISO AS QUALITY MANAGEMENT SYSTEM TOOLS: AZERBAIJAN EXPERIENCE Yusif Yusifov, Deputy Chairman of the State Statistical Committee of the Republic.
Presentation transcript:

Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF (Sept 22, 2006-Sept 21, 2009).

Overview Document Type Recognition Metadata Extraction Item Description Speech Act Recognition Decision Support for Archival Review File Format Identification Demonstrations

Document Types, Metadata and Archival Description In responding to FOIA requests, Archivists need to be able to search collections of records with high precision and recall. But at the time of responding to FOIA requests, archivists have not read all of the records, so cannot index the records and search on such attributes as person, organization and location names, topics, dates, authors and addressees names and document types. Archivists cannot describe a collection until the collection has been manually read and reviewed. With increasing volumes of electronic records, it may be decades or even centuries before new acquisitions are described. Item Descriptions are needed in the results of FOIA Search Filename - 3

Method for Recognizing Document Types 1. Document Reader 2. English Tokenizer 3. Wordlist Lookup + enhanced wordlists 4. Sentence Splitter 5. Hepple POS Tagger + lexicon 6. Semantic Tagger + Named Entity Rules 7. Intellectual Element Annotator + Intellectual Element Rules (DER) 8. SUPPLE Parser/Interpreter + Document Type Grammars augmented with Semantics 9. Extract Metadata Filename - 4

Documentary Form: Intellectual Element Recognition Filename - 5

Filename - 6 Grammar for Documentary Form of a Memorandum

Parse Tree and Semantics of the Document Filename - 7

Extracted Metadata and Item Description in Manifest DOCTYPE = White House Memorandum DATE = April 27, 1992 AUTHOR = EDE HOLIDAY ADDRESSEE = SAM SKINNER TOPIC = California Earthquake DESCRIPTION = Memorandum dated April 27, 1992 from EDE HOLIDAY to SAM SKINNER regarding California Earthquake

Speech Acts and Record Description Actions are a part of item descriptions Signature Memorandum from Boyden Gray to the President recommending the nomination of Ronald B. Leighton to be a US District Judge. Letter from President Bush to President Mikhail Gorbachev suggesting an informal meeting. Memorandum from President Bush to Boyden Gray requesting an analysis of the War Powers Resolution. Letter from Susan Black to President Bush expressing appreciation for nomination and commitment to serve.

Speech Acts and Archival Review Archival review in response to FOIA requests requires recognition of the actions expressed in records Presidential Records Act restriction on disclosure a(5) Confidential Advice "confidential communications requesting or submitting advice, between the President and his advisors, or between his advisors Example of action expressing confidential advice: I further recommend that the President look for opportunities to speak at an appropriate event indicating his knowledge of and interest in this issue, …

Explicit & Implicit Speech Acts Every complete sentence carries out a speech act. Performative sentences express explicit speech acts. A performative verb is a verb whose action is accomplished merely by saying it or writing it. I recommend that you attend the conference. Declarative, imperative and interrogative sentences express implicit speech acts. Declarative (state) You completed the report Imperative (request) Please, complete the report. Interrogative (ask) Did you complete the report?

A Method for Recognizing Speech Acts in E-Records Input: Textual Document & metadata from the Manifest 1. Read author and addressee metadata from the manifest 2. Information extraction 3. Parse Sentences in the document 4. Speech Act Transducer Annotate Explicit Speech Acts Annotate Implicit Speech Acts Annotate Speech Acts Indicated by Text Structure Annotate Indirect Speech Acts Annotation of the Primary Speech Acts Output: [document(e1), author(e1, S), addressee(e1, H), act(e1 F(P))]

Decision Support for Archival Review FOIA (and systematic) review of Presidential records for PRA and FOIA restrictions on disclosure requires page-by page review of the records Due to the increasing volume of records, in all braches of Government, and especially EOP, decision support is needed to assist archivists in review.

Potential Benefits of Archival Review Assistant Reducing the risk of opening a document or passage of a record whose access should be restricted, A tutoring tool during training of review archivists. A tool that novice reviewers could use to check their work. Provision of additional evidence in case a reviewer's judgment was uncertain, or point out uncertainties, where the reviewer thought the decision was certain. Support estimation of FOIA review workload in terms of the number of restrictions and types of restrictions likely to apply. Support reviews of Federal Records for FOIA exemptions. Extension of the technology to support declassification of security classified records.

Components of an Archival Review Assistant

File Format Identification A capability to identify file formats is needed by ERA for Insuring compliance with Record Transmittal Agreement Viewing/playing files Conversion to current or standard file formats archive extraction Password recovery and decryption Repair of damaged files

Linux File Command & Magic File

Extensions of File Command and Magic File Magic for individual file formats Output of file command/magic file is File Format ID Rewriting file command code for identifying Characteristics of Text files and Document Types Defined approx. 800 file format signatures Collected examples of approx. 500 of the file format types Created File Signature Database Verified that File Format Identifier with magic file correctly identifies approx. 500 File Types

Demonstrations Demonstrations 1. Document Type Recognition, Metadata Extraction & Item Description 2. Automatic Recognition and Interpretation of Performative Sentences 3. Decision Support for Archival Review 4. File Format Library & File Format Identifier

Additional Information 1. W. Underwood et al. Advanced Decision Support for Archival Processing of Presidential E-records, TR ITTL/CSITD 09-01, Georgia Tech Research Institute, Sept W. Underwood & S. Laib. Automatic Recognition of Documentary Forms, Technical Report ITTL/CSITD 08-02, GTRI, May W. Underwood. Recognizing Speech Acts in Presidential E-records, TR ITTL/CDITD 08-03, GTRI, Oct 2008