1 On Embedding Machine-Processable Semantics into Documents Krishnaprasad Thirunarayan Department of Computer Science & Engineering Wright State University.

Slides:



Advertisements
Similar presentations
1 Web Site Design Overview of the Internet Cookie Setton.
Advertisements

Chapter 8 HTML Editors Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 8-2 Text Editors No single method Notepad Textpad, Notetab, and.
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
WeB application development
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
 To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may.
Intermediate Level Course. Text Format The text styles, bold, italics, underlining, superscript and subscript, can be easily added to selected text. Text.
Technical Tips and Tricks for User Support Mike Gardner
1 Web Wizards Guide To PHP David Lash Chapter 1 Introduction to PHP.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Adaptive Hypermedia on the Web: Methods, Technology and Applications Paul De Bra Eindhoven University of Technology Eindhoven, The Netherlands Centrum.
Outline IS400: Development of Business Applications on the Internet Fall 2004 Instructor: Dr. Boris Jukic Server Side Web Technologies: Part 2.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Interpret Application Specifications
Tutorial 8 Sharing, Integrating and Analyzing Data
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
Introduction to XML Rashmi Kukanur. XML XML stands for Extensible Markup Language XML was designed to carry data XML and HTML designed with different.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
CS221 File Output Using Special Formats. What is a File? A file is a collection of information The type of information in the file can differ image, sound,
Tutorial 3: Adding and Formatting Text. 2 Objectives Session 3.1 Type text into a page Copy text from a document and paste it into a page Check for spelling.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Class 6 Data and Business MIS 2000 Updated: September 2012.
CPSC 388 – Compiler Design and Construction Lecture: MWF 11:00am-12:20pm, Room 106 Colton.
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Luc Audrain Hachette Livre Head of digitalization
CIS 375—Web App Dev II ASP.NET 2 Introducing Web Forms.
WorkPlace Pro Utilities.
Formalizing and Querying Heterogeneous Documents with Tables Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering.
ULI101 – XHTML Basics (Part II) What is Markup Language? XHTML vs. HTML General XHTML Rules Block Level XHTML Tags XHTML Validation.
Adapting Legacy Computational Software for XMSF 1 © 2003 White & Pullen, GMU03F-SIW-112 Adapting Legacy Computational Software for XMSF Elizabeth L. White.
Towards validating observation data in WaterML 2.0 WATER FOR A HEALTHY COUNTRY You can change this image to be appropriate for your topic by inserting.
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 Exploring Microsoft Office Word 2007 Chapter 8 Word and the Internet Robert Grauer, Keith.
Assessing the Suitability of UML for Modeling Software Architectures Nenad Medvidovic Computer Science Department University of Southern California Los.
CSCI 1101 Intro to Computers 7.1 Learning HTML. 2 Introduction Web pages are written using HTML Two key concepts of HTML are:  Hypertext (links Web pages.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
10/18/2015 NORTEL NETWORKS CONFIDENTIAL – FOR TRAINING PURPOSES ONLY Global Documentation Evolution System Overview and End-to-End Process Training.
Dimitrios Skoutas Alkis Simitsis
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
HTML: Hyptertext Markup Language Doman’s Sections.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Chapter 23 - World Wide Web Documents (HTML) Introduction Display Hardware Varies A Browser Translates And Displays A Web Document A Consequence Of The.
LBSC 690 Session 5A Programming. Languages How do we learn a language? Learn by listening Then reading Then writing How do we teach programming? Learn.
Elucidative Programming Kurt Nørmark Aalborg University Denmark SIGDOC September 2000.
Microsoft Excel 2003 Illustrated Complete Data with Other Programs Exchanging.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Sample Talks for Organizational Hints Krishnaprasad Thirunarayan Department of Computer Science and Engineering Wright State University Dayton, OH
Creating PHP Pages Chapter 5 PHP Structure and Syntax.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
LBSC 690 Session 4 Programming. Languages How do we learn a language? Learn by listening Then reading Then writing How do we teach programming? Learn.
Introduction to the World Wide Web & Internet CIS 101.
PS-1 project Designing an E-commerce page for HMT BEARINGS LTD and SEO of the website.
1999, COMPUTER SCIENCE, BUU Introduction to HTML Seree Chinodom
Alexandria University Faculty of Science Computer Science Department Introduction to Programming C++
HTML. HTML: What is it? – HTML stands for Hyper Text Markup Language – An HTML file is a text file containing small markup tags – The markup tags tell.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
The Internet Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
Glencoe Introduction to Web Design Chapter 4 XHTML Basics 1 Review Do you remember the vocabulary terms from this chapter? Use the following slides to.
XP Creating Web Pages with Microsoft Office
SNU OOPSLA Lab. A Tour of XML © copyright 2001 SNU OOPSLA Lab.
DHTML.
Exploring Microsoft Office PowerPoint 2000 Chapter 2
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
A Modular Approach to Document Indexing and Semantic Search
Presentation transcript:

1 On Embedding Machine-Processable Semantics into Documents Krishnaprasad Thirunarayan Department of Computer Science & Engineering Wright State University Dayton, OH-45435, USA

2 Talk Outline Background and Motivation (Why?) Goals (What?) Details (How?) Conclusions

3 Background and Motivation

4 Heterogeneous Doc.Spec. Defn. Rep. Content Extraction: Formalize doc, using controlled vocabulary

5 Problems with this approach to content extraction Archiving spec (for human comprehension) separately from its formalization is not conducive traceability. Manual extraction from spec (from scratch) for each use is labor intensive, time consuming, and prone to typographical errors.

6 Observation Conceptually, every piece of information in an extraction owes its existence to a phrase in spec, and possibly, controlled vocabulary. So, explore techniques to maintain correspondence between a spec fragment and its formalization.

7 Goal

8 General Problem Embed domain-specific mark-up (annotations) into human sensible document to make explicit semantics of “content” text and complex data, and to augment an interpretation in a modular fashion. Document text: Human comprehensible Semantic Mark-up: Machine processable

9 Details (How?)

10 Nature of Specs Semi-structured Heterogeneous Text Tables Images Constrained technical vocabulary Available as MS Word document

11 Pre-processing Spec Abstract content from spec document by removing display oriented information Save text Save tabular data, preserving grid layout Retain links to images … Note: “Save As text” option in MS Word inadequate

12 Heterogeneous Document

13 XML generated by Majix

14 ASCII Output

15 Annotating Pre-processed Spec Embedding Machine Processable Semantics Recognizing and tagging text using controlled vocabulary By product of: Document Indexing and Semantic Search Tagging tabular data to make explicit its semantics : Same grid layout, but different interpretation and dependencies based on headings Explore: XML-based programming language Water for defining data and its behavior (semantics)

16 Locating Controlled Vocabulary Terms

17 Example Table Thickness (mm) Tensile Strength (ksi) Yield Strength (ksi) 0.50 and under – –

18 Example of Tagged Table Thickness (mm) Tensile Strength (ksi) Yield Strength (ksi) table and under table table table....

19 Example of Processing Code /> <set rows= table.rows. />/> …

20 (cont’d) ….1/> temp..0/> /> > table.rows..2 />

21 (cont’d) … fluid. <try /> TensileStrength > "TABLE: out of range error occurred"

22 Water XML-based OO Scripting Language Facilitates creating Web Services Run methods remotely via web-browser Generalizes dynamic typing to constraint checking Conformance of actuals to formals

23 Pros and cons Encoding Improvement Amount of tagging can be controlled by suitably delimiting table data and annotating it with corresponding “string-processing” method Master Copy Update Changes to spec requires manual modification to archived annotated version. Irregular Tables in Specs Different units, etc

24 Some Related Work Microsoft Smart Tags Recognize “controlled” words in Office 2003 documents and associate predefined list of actions with each occurrence SHOE Table data in a declarative (logic) language

25 Prolog rendition strengthTableRow( 0, 0.50, 165, 155). strengthTableRow(0.50, 1.00, 160, 150). strengthTableRow(1.00, 1.50, 155, 145).... strengthTable(Thickness, TensileStrength, YieldStrength) :- strengthTableRow(L, U, TensileStrength, YieldStrength), L = Thickness. thicknessToTensileStrength(Thickness, TensileStrength) :- strengthTable(Thickness, TensileStrength, _). thicknessToYieldStrength(Thickness, YieldStrength) :- strengthTable(Thickness, _, YieldStrength). ?- thicknessToYieldStrength(0.6,YS).

26 Conclusions

27 A Step towards Holy Grail Ultimately enable authoring and/or extracting, human-comprehensible and machine-processable parts of a document “hand in hand”, and keep them “side by side”.