HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.

Slides:



Advertisements
Similar presentations
HTML Basics Customizing your site using the basics of HTML.
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Introduction to HTML & CSS
Accessible Video in a Diverging Web Environment Association for Educational Communications and Technology (AECT) 2005 International Convention - Orlando,
Lesson 15 Presentation Programs.
Chapter 11 Media and Interactivity Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D 1.
Chapter 11 Media and Interactivity Basics Key Concepts
HTML5, OVERVIEW AND NEW FEATURES PowerPoint by Mason O’Mara.
Iframes & Images Using HTML.
WeB application development
A really fairly simple guide to: mobile browser-based application development (part 1) Chris Greenhalgh G54UBI / Chris Greenhalgh
Authoring Languages and Web Authoring Software 4.01 Examine web page development and design.
INF Web Design Using Multimedia on the Web Video - Part 1.
HTML Introduction HTML
Tutorial 7 Working with Multimedia. XP Objectives Explore various multimedia applications on the Web Learn about sound file formats and properties Embed.
Chapter 15 HTML 5 Video and Audio Intro to HTML5 1.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Chapter 14 Introduction to HTML
Session: 11. © Aptech Ltd. 2HTML5 Audio and Video / Session 11  Describe the need for multimedia in HTML5  List the supported media types in HTML5 
Chapter 11 Adding Media and Interactivity. Flash is a software program that allows you to create low-bandwidth, high-quality animations and interactive.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
Using HTML 5.  HTML 5 uses a standard method to embed audio into Web pages.  Prior to HTML 5, browser plug-ins or separate applications such as Windows.
Chapter 1 Introduction to HTML, XHTML, and CSS
Computer Concepts 2014 Chapter 7 The Web and .
Introduction to HTML. Topics HTML –What is HTML –Parts of an HTML Document –HTML Tags.
Multimedia and the Web Chapter Overview  This chapter covers:  What Web-based multimedia is  how it is used today  advantages and disadvantages.
Adobe Dreamweaver CS5 Introduction Web Site Development and Adobe Dreamweaver CS5.
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
Chapter 2 Developing a Web Page. Chapter 2 Lessons Introduction 1.Create head content and set page properties 2.Create, import, and format text 3.Add.
HTML Structure & syntax
E-Commerce: Introduction to Web Development 1 Dr. Lawrence West, Management Dept., University of Central Florida Topics What is a Web.
Tutorial 7 Working with Multimedia. XP Objectives Explore various multimedia applications on the Web Learn about sound file formats and properties Embed.
Web Accessiblity Carol Gordon SIU Medical Library.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
Tutorial 7 Working with Multimedia. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Explore various multimedia applications.
Tutorial 7 Working with Multimedia. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Explore various multimedia applications.
XHTML1 Images N100 Building a Simple Web Page. XHTML2 The Element The src attribute specifies the filename of an image file To include the src attribute.
HTML Structure & syntax. Introduction This presentation introduces the following: Doctype declaration HTML Tags, Elements and Attributes Sections of a.
Session: 1. © Aptech Ltd. 2Introduction to the Web / Session 1  Explain the evolution of HTML  Explain the page structure used by HTML  List the drawbacks.
U NDERSTAND THE W EB AND D IGITAL C OMMUNICATIONS P ATHWAY 4.02 U NDERSTAND HOW W EBPAGES ARE CREATED AND USED.
Introduction to HTML. Today’s Discussion What is HTML ? What is HTML ? What is Web Page ? What is Web Page ? Web Server Web Server Web Browser Web Browser.
UPLOAD / DOWNLOAD april  HTML5 is just the next iteration of HTML  Previous version was technically HTML 4.01, which incorporated XHTML 1.0.
CHAPTER 15 HTML 5 VIDEO AND AUDIO Intro to HTML5 1.
MODULE 3 Internet Basics © Paradigm Publishing, Inc.1.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Chapter 11 Adding Media and Interactivity. Chapter 11 Lessons Introduction 1.Add and modify Flash objects 2.Add rollover images 3.Add behaviors 4.Add.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
CS 200 Multimedia Objects in Web Pages. MultiMedia Objects Three primary types of multimedia objects  Audio  Video Includes Flash Objects  Images.
An Introduction.  Introduction  Logging in from D1  Raison d'être  RSS and Podcasting  DragonDrop is…  What does it do?  Upload  Available Output.
Chapter 8 Adding Multimedia Content to Web Pages HTML5 & CSS 7 th Edition.
Hyper Text Markup Language.  My First Heading My first paragraph. Example Explained The DOCTYPE declaration defines the document type The text between.
HTML Structure & syntax
Chapter 9 HTML 5 Video and Audio
The HTML5 logo was introduced by W3C in 2010
Web Concepts Lesson 2 ITBS2203 E-Commerce for IT.
4.01B Authoring Languages and Web Authoring Software
Chapter 1 Introduction to HTML.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Chapter 4: HTML5 Media - <video> & <audio>
Learn HTML Basics Lesson No : 10
Tutorial 7 Working with Multimedia
Objective % Explain concepts used to create websites.
Essentials of Web Pages
4.01B Authoring Languages and Web Authoring Software
Chapter 27 WWW and HTTP.
Lesson 5: Multimedia on the Web
Objective Explain concepts used to create websites.
Creating a Basic Web Page using HTML
Presentation transcript:

HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June 16-18, Austin, TX

Contents Introduction Background Algorithm & Implementation Discussion Conclusion

Introduction Computing & Technological Environment Changes –Emerging Mobile Web –HTML5 standard for mobile web the latest revision of HTML reduces the need for proprietary plug-in technologies (e.g., Adobe Flash and Microsoft Silverlight) Preservation in DL –Long-Term Preservation via Archiving –Migration For Better Access to Mobile Web

An Example of ETD Title Page

ETD “Splash” Page ETD Metadata Files* Type of Document Author Metadata … Filename Size Approximate Download Time 288 Modem Metadata …

Identifying links among files … Afront.pdf Ch1.pdf Ch4_result. avi Ch4.pdf Ch3_result. mp3 Afront.pdf Ch1.pdf Ch2.pdf Ch4_result.avi Ch3.pdf Ch4.pdf refs.pdf Ch3_result.mp3 Refs.pdf Linking Files …

Issues for migration strategy How is conversion to HTML5 conducted? Which browsers support HTML5? Which video file formats are supported by current browsers? Which video file format converters support conversion into different file types? Which pdf2txt extractors are effective? How will HTML5 ETDs work on mobile devices (e.g., Android phone, iPod, iPad)?

Algorithm PDF ETD Multimedia file link extractor ETD structure analyzer ETD structure analyzer Multimedia file source extractor PDF2Text/ HTML converter HTML5 ETD HTML5 converter HTML5 tag set TXT/ HTML HTML Tagged MM Source TXT/ HTML Tagged TXT Text/ Grammar

PDF2TXT/HTML Convert a presentation format, e.g., PDF, into an intermediate format, plain text, or semi-presentation format, HTML, to find some link candidates and add useful HTML5 tags (e.g., video, audio, etc.). PDFbox ( –An open library to parse PDF and extract text –PDFParser class to parse the entire document –PDFTextStripper class to extract the PDF's text PDF ETD PDF2Text/HTM L converter Using PDFBOX PDF2Text/HTM L converter Using PDFBOX TXT/HT ML ETD

ETD Structure Analyzer Parse the ‘Table of contents' section Analyze inter-structure between –logical page structure (e.g., ii, iii,…, 1, 2, …) –logical structure (e.g., Abstract, …, Chapter 1,…) Information used to insert HTML5 tags –header, article, section "table of content analysis for ETD structuring" –segmentation of headings, logical pages –from table of contents –using regular expressions ETD structure analyzer ETD structure analyzer TXT/ HTML Tagged TXT

‘Table of Contents’

Inter-structuring (Example) … ……… … ……… ETD Pages Logical page structure Physical page structure … ……… ETD Cover Pages Lines Title Logical structure Table of Contents Inter- structuring

Result of Structure Analyzer (1/2) Logical page structure Physical page structure Logical structure

Result of Structure Analyzer (2/2) Analyzed structure and the first 3 items of the ETD

Multimedia Link Source Extractor Source information for multimedia files –E.g., URL, file names –'src' property in the 'video' or 'audio' tags Algorithm in Perl script Multimedia file source extractor HTML ETD Title Page Tagged MM Source

ETD Files in the ETD Title Page (Multimedia Link Sources) Video files (.avi) Video files (.avi)

Multimedia Link Candidates Extractor (1/2) Process –Input: multimedia link sources –Extract link candidates from the plain ETD text –Finds matches in the plain text –Output: a tagged text file with multimedia type attributes (e.g., video or audio or …) Multimedia file link extractor Tagged MM Source Tagged TXT

Multimedia Link Candidates Extractor (2/2) Implemented in Perl –simple string match between multimedia link sources (e.g., list of file names), candidate links –code integrated into the HTML5 main graphical user interface written in Java and Java SWT Multimedia file link extractor Tagged MM Source Tagged TXT

Multimedia Link Candidates in the PDF ETD Link candidates in context: Video file names (.avi) Link candidates in context: Video file names (.avi)

HTML5 Conversion (1/2) combines all information for producing an HTML5 document –Useful HTML5 tags such as,,,,, etc. –a plain text ETD with link candidate tags –link sources (e.g., file names, URL) –structure information of ETD (e.g., header, footer, chapter, section) HTML5 ETD HTML5 Converter HTML5 tag set Tagged TXT Text/ Grammar

HTML5 Conversion (2/2) key part of the conversion –Outputting the text during the first step, PDF2TXT sets up, –header, body, and other tags. more interesting part of the conversion: –video insertion and tagging with source information HTML5 ETD HTML5 Converter HTML5 tag set Tagged TXT Text/ Grammar

Main Screen of HTML5 Converter

Browsing HTML5 ETD

Viewing Page Source Note: Video file extensions (.ogg) were edited manually for the purpose of de monstration.

Discussion – Problems (1/2) 1. How to migrate from PDF files into HTML5 files 2. What PDF2txt extraction tools are most effective 3. How to avoid loss of formatting information (size, color, font, etc.) when the text comes from PDF 4. How to avoid multiple image parts stacking (Some of the images from the PDF file, appear stacked on top of one another.)

Discussion – Problems (2/2) Which browsers support HTML5, esp., video / audio? –No: Internet Explorer, Opera –Yes: Mozilla Firefox, Google Chrome, Safari Which mobile devices view HTML5 video? –No: Cell phones: Android 2.1, Blackberry –Yes: iPod touch, iPhone, iPad

Discussion – Solutions PDFBox was best for extracting from PDF Problem with multiple parts for one image: –no real solution yet –something to do with the created image type Problem with file types: convert video to ogv Problem with the browser type: –use a browser which supports it, or –use HTML5 embed tag for a standalone media player, e.g., Windows Media Player, Flash

Discussion – Mobile Adaptation in Digital Libraries ETD sustainability Adapt structure to mobile computing environment System-oriented adaptation to browsers small-size display wireless network User-oriented adaptation to beginners vs. experts, handicapped tasks – learning, collaboration Case of HTML5 ETDs accessed by general users through mobile web browser from wireless networks

Conclusion HTML5 Converter S/W tool prototype HTML5 ETDs converted semi-automatically Future work –Adapt to mobile web and semantic web –Serve: individual human needs, mobile web browsers, small screens on mobile devices –Adapt to semantic web to create machine readable content, using Microdata and RDFa Questions & Answers