TeX2Star A System for Converting TeX to OpenOffice By Jeffrey Starr.

Slides:



Advertisements
Similar presentations
Word Processing and Desktop Publishing Software
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
HTML: HyperText Markup Language Hello World Welcome to the world!
INTRODUCTION TO HYPERTEXT MARKUP LANGUAGE 1. Outline  Introduction  Markup Languages  Editing HTML  Common Tags  Headers  Text Styling  Linking.
XHTML Basics Web pages used to be written exclusively in html
1 eVenzia Technologies Learning HTML, XHTML & CSS Chapter 1.
Page Features Footnotes and endnotes Headers and footers Page numbering Margins Columns.
® Microsoft Office 2010 Word Tutorial 3 Creating a Multiple-Page Report.
Review #
Lesson 16 Enhancing Documents
COMPREHENSIVE ICT Document Preparation System Mr.S.Sajiharan Computer Unit Faculty of Arts and Culture South Eastern University of Srilanka.
1 Introduction to Word Chapter 2 Lecture Outline.
157A, Fall Semester 2006 Brent Turner. Presentation Contents: 1. What Is LaTeX? 2. History of LaTeX 3. LaTeX Use 4. Typesetting – HTML vs. LaTeX 5. LaTeX.
Lesson 2 — Working with Text
Chapter 10 Formatting a Document.
Chapter 12: Network Programming Desktop Publishing Translator models Latex Documentation Preparation Postscript programming language WYSIWIG Editors.
XML October 24, Unit 6. What is XML? Stands for eXtensible Markup Language It is a markup language, like HTML But, –XML is designed to markup data –HTML.
How to Open Microsoft Word Click Start Click All Programs Click Microsoft Office Click Microsoft Word 2013.
HTML BASIC
Other Features Index and table of contents Macros and VBA.
Word processing June 2013.
Word Tutorial 3 Creating a Multiple-Page Report
Word Create footnotes and endnotes. Course contents Overview: Be a footnote and endnote whiz Lesson 1: Add footnotes and endnotes Lesson 2: Beyond the.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Paragraph and Page Formatting in Word Graham Seibert Copyright 2006 This is a segment of the draft version of a large syllabus. I need your feedback to.
1 ISMT E-120 Desktop Applications for Managers Office for the Web.
Introduction to Unix – CS 21 Lecture 16. Lecture Overview LaTeX History Running and creating LaTeX documents Documents and Articles Tables Lists Fonts.
HTML history, Tags, Element. HTML: HyperText Markup Language Hello World Welcome to the world!
Chapter 2 HTML Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D 1.
Creating a Document with a Title Page, Lists, Tables, and a Watermark
A Basic Web Page. Chapter 2 Objectives HTML tags and elements Create a simple Web Page XHTML Line breaks and Paragraph divisions Basic HTML elements.
Just Enough HTML How to Create Basic HTML Documents.
DocUtils: A Documentation Utility Package Bill Spotz Sandia National Laboratories 2006 Trilinos Users Group Meeting Nov 9, 2006.
Introduction to HTML Wah Yan College (Hong Kong) Mr. Li C.P.
CIS—100 Chapter 7—Headers and Footers 1. Chapter Objectives 2 After successful completion this chapter you should be able to:  Add page numbers.  Add.
Ali Alshowaish. What is HTML? HTML stands for Hyper Text Markup Language Specifically created to make World Wide Web pages Web authoring software language.
Review Microsoft Word 2010 CS Edit and Format a Document  Open a previously saved document  Select text by  clicking,  clicking and dragging,
Accessible Word and PDF documents
FIRST COURSE Word Tutorial 3 Creating a Multiple-Page Report.
L. Anne Spencer (c) 2001 Basic Web Design Document, text, & layout formatting tags & attributes.
ICT 111 – PART 2 APPLICATIONS SOFTWARE /11: APPLICATIONS SOFTWARE Remember: Computer hardware VS human body Computer operating systems VS human.
HTML Basics Computers. What is an HTML file? *HTML is a format that tells a computer how to display a web page. The documents themselves are plain text.
Typing and Formatting a Research Paper WORD 2013.
Lesson 6 Formatting Cells and Ranges. Objectives:  Insert and delete cells  Manually format cell contents  Copy cell formatting with the Format Painter.
Securing and Sharing Workbooks Lesson 11. The Review Tab Microsoft Excel provides several layers of security and protection that enable you to control.
Formatting Text with HTML. Objectives: Students will be able to: Define the structure of the document with block elements Format numbered, bulleted, and.
BASIC WORD PROCESSORS WEEK 5. BASIC WORD PROCESSORS Word Processor Word processor is a program which is used to edit text files and format them with font,
WORLD CONSORTIUM Welcome to. An overview by Phil Elliott Satzconcept Skandinavia a.s.
What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it is a markup.
The idea of adding markup instructions to documents is not new. Before computers, authors would make annotations by hand in their written or typed documents.
XP 2 HTML Tutorial 1: Developing a Basic Web Page.
Building a Website: Layout Fall Overall Structure: Home Page Title Section Title Frame Picture UNCP Math Menu Content Footer Contact Information.
1 Organizing Information in Tables A table is information arranged in horizontal rows and vertical columns When you first insert a table into a document,
Word Processing1. 2 Word Processing f What you need to know about: –entering text; –word-wrap; –alter text alignment; –line spacing –alter text style.
Using Microsoft Office Word Assignment Layout. Target Create a Cover Page (Front Page) Create a Table of Contents Page Create a Table of Figures Page.
Text Elements. We've already learned about the,,,, and elements. Now let's learn some elements that we'll use to present actual text content on our web.
Unit 2 Software exploitation Part A: Word Processing.
Desktop Publishing Lesson 2 — Working with Text. Lesson 2 – Working with Text2 Objectives  Create a blank document.  Work with text boxes.  Work with.
Word processing is the software package that enables you to create,edit, print and save documents for future retrieval reference. creating a document.
Q.Nand1 HTML Creating an HTML Document Lesson 2. Q.Nand2 Overview Creating an HTML Document: –HTML syntax –Creating Basic Tags –Displaying Your HTML Files.
Mail Merge Introduction to Word Processing ITSW 1401 Instructor: Glenda H. Easter Introduction to Word Processing ITSW 1401 Instructor: Glenda H. Easter.
© 2004 The McGraw-Hill Companies, Inc. All rights reserved. The Advantage Series Microsoft Office Word 2003 CHAPTER 4 Printing and Web Publishing.
Lesson 5. XHTML Tags, Attributes and Structure XHTML Basic Structure head and body titles Paragraph headings comments Document Presentation Manipulating.
Indesign: setting up a proposal efficiently
Lesson 16 Enhancing Documents
Lesson 16 Enhancing Documents
Improving Braille accessibility and personalization on Internet
Word Processing and Desktop Publishing Software
Word Processing Software Photo credit: © 2007 JupiterImagesCorporation.
TERMS AND CONDITIONS   These PowerPoint slides are a tool for lecturers, and as such: YOU MAY add content to the slides, delete content from the slides,
Presentation transcript:

TeX2Star A System for Converting TeX to OpenOffice By Jeffrey Starr

Overview ● Why does conversion matter? ● Why has it not already been done? – Why is it difficult? ● Proposal: TeX->OpenOffice ● Proposal: TeX->DVI->OpenOffice ● Solution ● Unsolved problems

What is OpenOffice? ● Open Source office suite ● Based on StarOffice, currently owned by Sun Microsystems ● Cross-Platform ● XML based, standards driven ● Semantic-based format

What is TeX? ● Written by Donald E. Knuth ● Solution to declining standards in mathematical typography ● Heavily used in mathematics and physics ● Both a program and a programming language ● Presentation-based format

Why Bother to Convert? ● TeX rare outside mathematical circles ● Conflicts with publishing software ● Does not fit within current word processing model ● TeX's purpose to is to produce journal-quality typography, not facilitate editing of content.

Aside: Editable Output ● TeX has many presentation outputs: – DVI – PostScript – PDF – PNG – TIFF – Fax ● TeX has no direct editable outputs.

Solution: TeX->OpenOffice ● Why use the outputs? Read the original document. ● Perfect knowledge of content and (presentational) intent ● Write a program that reads TeX and outputs OpenOffice, instead of DVI

Problems with TeX->OpenOffice ● TeX is a large system – Eight years development – Too large for a semester ● Irregular ● Non-Balanced ● Many special cases

TeX is Irregular ● An irregular language is one in which typical rules of processing are violated ● Irregular '\atop': (TeX) – {numerator \atop denominator} ● Regular '\frac': (LaTeX) – \frac{numerator}{denominator}

TeX is not balanced ● A language that is balanced will have an explicit beginning and end to each grouping ● Non-balanced font commands: (TeX) – \bf this is bold \rm this is normal, roman text ● Balanced font commands: (LaTeX) – \textbf{this is bold} this is back to normal

TeX has many special cases ● \par may either: – explicitly end a paragraph – do nothing (if in math mode) – do nothing (if in restricted horizontal mode) – tell TeX to build the current page ● \par is also irregular (acts on material already processed and in the reverse direction) and unbalanced (may or may not be proceeded by \indent, a primitive to start a paragraph)

Solution: TeX->DVI->OpenOffice ● Let TeX deal with TeX ● Run TeX on the original text ● Read the resultant DVI output ● Process the DVI output to OpenOffice

Problem: Lack of semantic data ● DVI contains font definitions, text stream, and description of black boxes ● Fonts contain characters, but do not say what those characters are – Especially a problem with kerning “ff” vs. “ff” – Also a problem with bold and italics text --- bold and italics are their own fonts

Solution: Add Annotations ● Use interpositioning and the TeX primitive '\special' to send extra information to DVI file ● \special leaves comments that can be read later ● Reading the DVI with proper annotation allows the text to retain some level of semantic information ● Difference between knowing that the next character is smaller and raised versus knowing that the next character is a superscript

Problem: Unbalanced Tags ● Some primitives are balanced, but many are not ● Tags may affect the document for an arbitrary length of time or are local to a paragraph or specific block of text

Solution: Balancing ● Algorithm: – Given: database of tags ● start tag, end tag, 'insert end tag' tags – Go through list of tags, find one that needs help balancing – Go forward along list, finding nearest tag that closes the previous tag, or end of document – Insert end of tag into the list of tags

Post Document Editing ● Further balancing and insertion of tags may be necessary after first sweep through file ● Tables: – OpenOffice format requires number of columns to be specified – We don't know how many columns will be needed until after we read the entire table – Solution: After processing, go back and insert the needed information

Unsolved Problems ● Footnotes: – Defined by position in the page – Automatic positioning conflicts with paragraph detection tool – Unable to discern between footnotes, extra paragraph, header, or footer ● Non-English alphabets

Conclusion ● Semantics of document are lost in TeX itself, so no hope of recovery ● Overt presentation can be recovered for editing ● Method works to translate an irregular, non- well formed language into a regular, well- formed language (XML)