Presentation is loading. Please wait.

Presentation is loading. Please wait.

Manuscript Transcription Assistant Initiative

Similar presentations


Presentation on theme: "Manuscript Transcription Assistant Initiative"— Presentation transcript:

1 Manuscript Transcription Assistant Initiative
Shaun Calhoun Anthony Tiscia Terrence Turner

2 History of Project Transcribers of ancient manuscripts have no custom application for maintaining formatting, and preserving manuscripts. 60 km of rotting manuscripts exist in Venice alone.

3 Previous Project Work Completed Goals Current Goals
Preserved manuscript formatting. Stored metadata information along with transcription. Created initial transcription application Established framework for future projects. Current Goals Automatic box creation around words. Improve previous project’s functionality. System to encourage collaboration on manuscript transcription.

4 System Overview Definitions
Repository (Blue): Location of transcription and manuscript files, maintained by an archivist. (i.e. AAS, Univ. of Madrid) Software (Yellow): Used by transcriber to create new transcriptions. Submission Incentive (Green): Interface between Software and Repository

5 Current Project Work Server and Manuscript Repository
Website and Submission Incentive MTA MML XPG USER RANK $$ Stand Alone Application

6 Manuscript Transcription Software
Previous version did not automatically generate boxes around words. Offered minimal functionality. In many ways using a word processor was still easier Current work seeks to provide the user with an interface that makes transcribing easier.

7

8 Manuscript Software Improvements
The application now automatically generates boxes greatly reducing time needed to complete a transcription Uses modified block segmentation. Image is binarized separating text from background. Smoothing is used to ease search for words Connected components are searched for and boxes are placed around them.

9 Image Converted to B/W Vertical Smear Final Smear

10

11 Boxing Limitations Has problems separating text from background in certain images. Extreme skew causes problems in boxing words properly Currently does not have skew detection implemented

12 More Improvements Interactive Cursor similar to a word processor.
Able to select and format multiple bounding boxes at the same time. Cut, Copy, Paste, etc. Image zooming for better view of text. Attempts to preserve document layout by fitting text to boxes. User interface improvements

13 Website and Incentives
Acts as interface between Repository and End-User Software Provides incentives for end-users to download and participate in the transcription process.

14 Website Subsystems Account Subsystem
Manages creation, retrieval, updating and deletion of user accounts. Moderation Subsystem Manages relevance of transcriptions based upon explicit and implicit metrics File Management Subsystem Manages transcription and manuscript files and their metadata Data Subsystem Manages databases of all other subsystems Account Moderation SQL DB Data Repository File Managmnt

15 End-User with Web Browser
Account Management Account subsystem is a two tiered distributed system. Server-based scripts interface with user for display, and interface with data subsystem for queries to accounts End-User with Web Browser Server Query Display Page Server Side Processes Queries DB Return Result SQL Database Module

16 User Login Example User with Web Browser Website Back-end SQL Database
Case Study #1: Perfect Day User comes to transcription website and decides to log in User enters login data and submits to web site’s back end. Web site’s interfaces with SQL database for user information and preferences SQL database returns preferences to web site’s back end Web site’s back end dynamically writes web page to user’s web browser, writes cookie to user’s computer. User with Web Browser Website Back-end SQL Database

17 Moderation Subsystem Provides accuracy and relevance management in an emergent environment. High ranking transcriptions become a “gold standard” through consistent ratings. Users who provide exceptional transcriptions will can access more of the Repository for completed transcriptions.

18 Moderation Example Case Study #1: Perfect Day
User retrieves and reads a transcription page User assigns a weight to the article and submits it to web site back end Back end interfaces with SQL database Database returns result of function Back end returns user to previous page. User with Web Browser Website Back-end SQL Database

19 Account and Moderation Limitations
Account subsystem does not provide a secure connection to log in to service Account subsystem does not maintain state between pages Moderation subsystem can not handle “cheating” when moderating transcripts Equation for moderation is the arithmetic mean of the moderation points

20 File Management System
Interface between system and hard disk Main purpose: security Requirement for collaboration system Gain permission to access documents in archive by first submitting your work

21 Documents Each document has “metadata” MTA software has its own format
information used to describe it E.G.: author, title, date, subject, language, etc. MTA software has its own format A combination of two popular formats Compatible with both formats

22 Metadata Database SQL database
Stores the most important metadata of every document in the archive Automatically added upon submission Purpose: View doc. info without accessing hard disk Quick and easy searching

23 Searching Only reasonable means of locating documents
Search through metadata elements in database E.G.: a user searches for all manuscripts written by a certain author in a specific time period Much faster than searching through hard disk

24 Transcription Submission Example
User submits transcription User submits transcription to archive via software or website File management system extracts metadata, adds to metadata database File management system adds transcription to repository File management system adds credit to user's account MTA software File Management Repository MD Database Account System

25 Archive Limitation Lacks security Search GUI is confusing
Minimal display “Credit” algorithm is simplistic Lacks means of measuring quality

26 Conclusions Developed framework for collaboration system defined in previous project work MML file specification required for future progress to be significant. Refinement needed in individual subsystems in MTA application and server

27 Questions? Special Thanks to: Tom Knoles Prof. Stanley Selkow
Prof. Fabio Carrera


Download ppt "Manuscript Transcription Assistant Initiative"

Similar presentations


Ads by Google