National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Discovery of Relationships between 2D Engineering Drawings and.

Slides:



Advertisements
Similar presentations
File Format Identification and Archival Processing
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
Test Case Management and Results Tracking System October 2008 D E L I V E R I N G Q U A L I T Y (Short Version)
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
ECM RFP 101 Presented by: Carol Mitchell C.M. Mitchell Consulting.
Management Information Systems, Sixth Edition
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
1 Extending the Implementation of PREMIS to Geospatial Resources in the Stanford Digital Repository: An Exploration By Nancy J. Hoebelheinrich Metadata.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Date: January 21 st, 2009 Appraisal of 3D Data Conversions and.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Product Retrieval Statistics Canada / Statistique Canada Chuck Humphrey ACCOLEDS/DLI Training December, 2001.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Date: October 16 th, 2008 To Preserve or Not To Preserve? How.
A Framework for Relationship Discovery Among Files of Different Types Michal Ondrejcek, Jason Kastner and Peter Bajcsy National Center for Supercomputing.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
Architecture for a Database System
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The Conversion Software Registry Michal Ondrejcek, Kenton McHenry,
Metadata Extraction for NASA Collection June 21, 2007 Kurt Maly, Steve Zeil, Mohammad Zubair {maly, zeil,
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Developing Policy and Procedure Management System إعداد برنامج سياسات وإجراءات العمل 8 Safar February 2007 HERA GENERAL HOSPITAL.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The ISDA Tools Computationally Scalable File Migration Services.
SDMX IT Tools Introduction
Information Retrieval
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Product Workshop We talk about the latest improvements made to Trapeze and review some of the tools and features we’ve been working on. Presented by Anthony.
Chapter 7 Computer-Aided Design and Drafting in Architecture.
7th Annual Hong Kong Innovative Users Group Meeting
Building A Repository for Digital Objects
An Overview of Data-PASS Shared Catalog
An Introduction to Tessella and The Safety Deposit Box Platform
Product Retrieval Statistics Canada / Statistique Canada Title page
Extraction, aggregation and classification at Web Scale
2. An overview of SDMX (What is SDMX? Part I)
Chapter 1: The Database Environment
The Database Environment
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Software Re-engineering and Reverse Engineering
Presentation transcript:

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Discovery of Relationships between 2D Engineering Drawings and 3D CAD Models Presented by: Peter Bajcsy -Research Scientist at NCSA -Associate Director of I-CHASS, I3 Institute -Adjunct Assistant Professor, CS & ECE UIUC

Acknowledgement This research was partially supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA #SCI , ONR TRECC, and NCSA Industrial Partners. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Archive and Records Administration, or the U.S. government. Contributions by: Peter Bajcsy, Kenton McHenry, Rob Kooper, Michal Ondrejcek, Jason Kastner and Luigi Marini Imaginations unbound

Outline Introduction Previous Related Work to Relationship Discovery A Discovery of Relationships Among 2D and 3D Digital File Collections (file2learn) Summary

Introduction

Discovering Relationships: Introduction Problem statement: How should one establish relationships among electronic records coming From disparate sources Before and after processing From the same source at multiple time instances? Note: Sources = digital information providers Reports from agencies, complex measurement instruments, simulations by computers, twitter, … Imaginations unbound

Discovering Relationships: Introduction File content changes over time and space Motivation: need for relating new files to existing files Applications: perform change detection, search, appraisal, social networking, … Additional cost of redundant bytes Motivation: need to minimize the cost Applications: content repository management, processing, preservation, … Imaginations unbound

Discovering Relationships: Challenges How to ingest content coming in many file formats and many data representations? What files are of my interest? How to describe file content? How to extract descriptors of file content? How to represent and store file content descriptors? What methods and metrics to use for comparing descriptors of two files? How to support end applications with comparisons/relationships? Imaginations unbound Automation and Scalability

Previous Related Work to Relationship Discovery Open source software Web services for communities

A Stack of Prototype Solutions Conversion Engine (Polyglot) Conversion Software Registry (CSR) Content Management (Medici) 3D-2D Relationship Discovery (File2Learn) Closed Source Software Software Reuse (AutoScript) Electronic Records Hardware Document Appraisal (Doc2Learn) Image Provenance Evaluation (IP2Learn) Computational Scalability Testbed Image Understanding (Im2Learn) Geospatial Image Understanding (GeoLearn, SP2Learn) Comparison (Versus) Distributed Computation (Cyberintegrator) 3D File Comparison (Model Browser)

Conversion Software Registry (CSR) Conversion Engine (Polyglot) How to ingest content coming in many file formats and many data representations?

Conversion Software Registry

File Format Conversion Engine: Polyglot

What files are of my interest? Multimedia content management (Medici)

3D Comparison Example (ModelBrowser) Software: Adobe 3D Reviewer Original File: WRL Converted Files: STP, STL, IGS, U3D Comparison Method: Light Fields [Chen, 2003] compares silhouettes from various viewing angles around the objects heart.wrl heart.stp heart.stl Conclusion: Information loss(WRL  STP)=Information loss (WRL  STL)>0

Comparison of Complex Files Content Based Comparison of Files in the Same Format: Given hundreds of versions of the ‘same’ Adobe PDF file, which file version(s) are similar?

Multiple Object Comparisons (Doc2Learn) Adobe PDF documents ~ {text, images, vector graphics, ….}

What methods and metrics to use for comparing descriptors of two files?

Comparison Framework: Versus

Information Loss Evaluation: Computational Requirements Files: one file in STP file format Software: Adobe 3D Reviewer, Cyberware PlyTool Comparison Method: Light Fields [Chen, 2003] Number of closed paths: 10 (28 individual conversions) Phase I: Find Phase II: Execute Phase III: Compare

A Discovery of Relationships Among 2D and 3D Digital File Collections

Relationships Among 2D and 3D Data Types Example Data: Torpedo Weapon Retriever existing 2D image drawings and N>22 3D CAD models Problem: How to establish relationships among the 3D CAD models and 2D image drawings during a product lifecycle? Imaginations unbound Hypothetical Distribution of 3D CAD models for TWR 841

Methodology File Identification Information Extraction from File System File Content Information Organization Taxonomy (classification) Ontology (relationships) Information Representation, Integration and Storage XML RDF Relationship Discovery

File Identification and File System Analyses File Identification What is the file format? Is the file format well formed? Approach: Used DROID built on top of the PRONOM File Registry with additional NCSA support of 3D file identification Metadata extraction about a file system Where is the file located? What is the file size, time stamp, etc.? Approach: Use any file system information extraction software, such as Aperture (cross platform, open source, active development), Google desktop, OS specific solutions (e.g., Apple Spotlight, Linux, MS Search)

Imaginations unbound OCR Content Analyses: Automation ? File Descriptors Relationship Discovery Part name, Author, Software, Date, …

Content Analyses: Optical Character Recognition (OCR) of 2D Drawings Reference Block Title Block MMC Block (Marinette Marine Corporation)

‘Standard’ Title Blocks: Organization and Ontology Examples of title blocks used on drawings prepared by Naval Construction Battalion and Naval Construction Regiment TEMPLATES

Title Block: Ontology and Metadata Representation Ontology for sub-fields: A – Record of preparation ( ), B – Drawing title ( ), C – Preparing Activity, F – Code identification number ( ), G – Drawing size ( ), H – Drawing number ( ), J – Scale ( ), K – Specification number ( ), L – Sheet number ( ). Resource Description Framework (RDF): Metadata representation: subject – predicate - object

MMC and Reference Blocks: Organization MMC Blocks Inconsistencies The list varies in length The notation is not standardized

Summary of OCR Based Analyses Manually encoded block coordinates for 784 files in PNG (converted from originally LZW compressed TIFF files) Automated OCR and executed OCR on 700 title blocks, 150 reference blocks, dozen of revisions and lists of materials about 200 additional areas with the drawing numbers (MMC DWG. NO.). Performance benchmarks: Full OCR of TB, MMC and RF for about 50 image files (105 blocks) took about 6 hours on a quad core machine

Content Based Extraction from STEP Files 3D CAD models in STEP file format are searched for any ASCII strings matching English dictionary and following STEP metadata specification. STEP METADATA SPECIFICATIONEXPECTED STEP METADATAPARSED STEP METADATA FILE_DESCRIPTION( /* description */ (''), /* implementation_level */ '2;1'); FILE_NAME( /* name */ '', /* time_stamp */'', /* author */ (''), /* organization */ (''), /* preprocessor_version */ ' ', /* originating_system */ '', /* authorization */ ' '); FILE_DESCRIPTION((''), /* implementation_level */ '2;1'); FILE_NAME( '120 TORPEDO WEAPONS RETRIEVER, TRANSVERSE BULKHEADS BELOW, MAIN DECK', ‘ ', ('LDOBSON'), ('NAVAL SEA SYSTEMS COMMAND'), ' ', 'IDA-STEP', ' '); FILE_DESCRIPTION((''), '2;1'); FILE_NAME( 'D:\\NARA\\Archieve_data_samples\\BHD_FR12\\ U2110_BHD12_2007_05_09.stp', ' T13:45:37', ('rakowpj'), (''), 'Autodesk Inventor 11', ''); Example Metadata for TWR841 ship deck

Exploratory Framework – User Interface Overview Files Filter for Files Preview of Selected Data Graph of Relationships Between Selected Files

Exploratory Framework – User Interface Overview Table of Relationships Between Selected Files Additional Import/Export and Preference Options

Exploratory Framework: Modes of Operations Detection of discrepancies/anomalies in file descriptors OCR results View 2D drawings and OCR results, and then edit OCR descriptors 3D Model View 3D model and content based extraction, and then edit descriptors Comparison of pairs of files Pairs of 2D drawings Pairs of 3D models Pairs of (2D drawing, 3D model) Establish file relationships Insert logical links to relate a pair of files

Detection of Anomalies in OCR Results

Comparison of Files Color encoding: Predicates and values match Predicates match Predicate occurs only in one file

Establish File Relationships

Establish File Relationships: Logical Link

Summary In general, finding relationships is still an open problem Automation and computational scalability are the keys to keeping files current, to dealing with the quantities, and to finding information Preservation solutions Forward looking solutions to preservation are based on standards Contemporary solutions to preservation are based on understanding historical electronic records

Software Summary File2Learn software – the work is still in progress The technologies are documented at Most of the software is available for downloading at Feedback is very welcome Questions: Peter Bajcsy –