MAKING BIG FILES SMALL AND SMALL FILES TINY LT Bruce Hill 1.

Slides:



Advertisements
Similar presentations
End-to-end document capture, indexation, OCR to Microsoft SharePoint
Advertisements

Efficient XML Interchange What is it? Why is it? How does it fit in?
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
CHAPTER 15 WEBPAGE OPTIMIZATION. LEARNING OBJECTIVES How to test your web-page performance How browser and server interactions impact performance What.
Open-DIS and XML DIS in Other Formats. Distributed Interactive Simulation DIS is an IEEE standard for simulations, primarily virtual worlds Binary protocol:
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform.
IS 373—Web Standards Todd Will
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Palantir A window-sharing system for Windows NT Max Feingold, Vladimir Livshits, and.
Introduction to HTML 2006 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
CS155b: E-Commerce Lecture 10: Feb. 13, 2003 XML and its relationship to B2B commerce Acknowledgements: R. Glushko, A. Gregory, and V. Ramachandran.
Philips Research France Delivery Context in MPEG-21 Sylvain Devillers Philips Research France Anthony Vetro Mitsubishi Electric Research Laboratories.
V1.00 © 2009 Research In Motion Limited Introduction to Mobile Device Web Development Trainer name Date.
Introduction to HTML 2004 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Efficient XML Interchange. XML Why is XML good? A widely accepted standard for data representation Fairly simple format Flexible It’s not used by everyone,
Phonegap Deployment CIS 136 Building Mobile Apps 1.
Unification of CytometryML, DICOM and Flow Cytometry Standard Robert C. Leif *a and Stephanie H. Leif a a XML_Med, a Division of Newport Instruments, 5648.
Using Multimedia on the Web
Formex XML Two years after introduction Dr. Holger Bagola Publications Office Directorate A ‘OJ and Access to Legislation’ ‘Methodology and development’
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
NetTech Solutions Working with Web Elements Lesson 6.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 Exploring Microsoft Office Word 2007 Chapter 8 Word and the Internet Robert Grauer, Keith.
An Introduction To Building An Open Standard Web Map Application Joe Daigneau Pennsylvania State University.
Instructor, Dr. Khalili Bahram Jeevan Kumar Gogineni.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Windows Media Format. The key features of Windows Media Format Included Microsoft Windows Media Video/Audio 9 codec Included Microsoft Windows Media Video/Audio.
Creating Multimedia Interaction with Windows Media Technologies 7.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 13 – Advanced.
Microsoft ® Business Solutions–Navision ® 4.0 Development II - C/SIDE Solution Development Day 5.
EXI Comparisions. EXI Emerging W3C standard, now in “final call” status on the standards track Provides a more efficient, alternate.
OpenXML: What is it?  XML-based file format which describes documents, presentations, spreadsheets, etc.  Replacement for binary file formats used in.
Efficient XML Interchange High Performance XML Don McGregor (mcgredo (at) nps.edu) Don Brutzman (brutzman (at) nps.edu)
Actualog Social PIM Helps Companies to Manage and Share Product Information Using Secure, Scalable Ease of Microsoft Azure MICROSOFT AZURE ISV PROFILE:
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
PERFORMANCE ENHANCEMENT IN ASP.NET By Hassan Tariq Session #1.
XML stands for Extensible Mark-up Language XML is a mark-up language much like HTML XML was designed to carry data, not to display data XML tags are not.
Chapter 12 Web Publishing. Goals Become an image optimization master Get a handle on Web file formats, including SVG and SWF Learn about Web image color.
XML Presented by Kushan Athukorala. 2 Agenda XML Overview Entity References Elements vs. Atributes XML Validation DTD XML Schema Linking XML and CSS XSLT.
1 Alternative view on Internet Computing Web 1.0 –Web 1.0 is first generation, Web Information based. Driven by Information provider. Web 2.0 Ajax enabled.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
TACTIC | Workflow: Project Management OSS on Microsoft Azure Helps Enterprises to Create Streamline, Manage, and Track Digital Content MICROSOFT AZURE.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
Restricted © Siemens AG All rights reserved A Developer’s Insights Into Performance Optimizations for Mobile Web Apps CT DC AA EM LP2 | June 2015.
Introduction to XML Jussi Pohjolainen TAMK University of Applied Sciences.
JSON. JSON as an XML Alternative JSON is a light-weight alternative to XML for data- interchange JSON = JavaScript Object Notation It’s really language.
Leveraging Web Content Management in SharePoint 2013 Christina Wheeler.
Aleksandar Drašković Enterprise Architect deroso Solutions GmbH Data shredding: a deep dive into SharePoint 2013 storage architecture.
Exploring Cross-Application Cellular Traffic Optimization with Baidu TrafficGuard Zhenhua Li, Weiwei Wang, Tianyin Xu, Xin Zhong, Xiang-Yang Li, Yunhao.
SVG technology SVG technology is what we want? is what we want? Jaehoon Woo KNU Real-Time Systems Lab. KNU Real-Time Systems Lab.
Advanced Tricks and Troubleshooting.  Introduction  Uploading File Bundles  DragonDrop Media Players  Embedded Media Player Sample HTML Code  Embedded.
INTRODUCING HYBRID APP KAU with MICT PARK IT COMPANIES Supported by KOICA
//liveVirtualacademy2011/ What’s New for ASP.NET 4.5 and Web Development in Visual Studio 11 Developer Preview Γιώργος Καπνιάς MVP, MCT, MCDP, MCDBA, MCTS,
Enhance Your Page Load Speed And Improve Traffic.
TEI 工作坊 TEI and Images October The Concept.
The Object-Oriented Thought Process Chapter 11
ALTOVA XMLSPY.
Microsoft Office Illustrated
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Tree Visualization.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Lesson 5: Multimedia on the Web
Basic Web Page Creation
Running C# in the browser
Yale Digital Conference 2019
Presentation transcript:

MAKING BIG FILES SMALL AND SMALL FILES TINY LT Bruce Hill 1

XML and JSON ●JavaScript Object Notation (JSON) is a common alternative to XML in web applications ●JSON is a plaintext data-interchange format based on JavaScript code ●JSON has compact binary encodings analogous to EXI: ○ CBOR ○ BSON ●Research Question: Is EXI more compact than CBOR and BSON? 2

EXI for Large XML Files ●W3C and previous NPS research measured EXI performance on XML up to 100MB ●Large data dumps can easily exceed that ●Research Question: How does EXI (but not CBOR/BSON) perform on files from 100MB - 4GB? 3

Methods Use Case Focus ●Compression results across multiple use cases look different from results for multiple files within a single use case ●Select a few use cases and study them in-depth Configuration Focus ●EXI has many configuration options that affect ● Compactness ● Processing speed ● Memory footprint ● Fidelity ●XML Schema affects EXI compression as well 4

When in doubt, try every possible combination of options 5 Encodings Compared Small Files Large Files

Small-file Use Cases (B to KB) ●OpenWeatherMap ●Global Position System XML (GPX) ●Automated Identification System (AIS) 6

EXI smaller than CBOR/BSON, aggregating data helps 7 AIS Use Case

Well-designed XML Schema improves performance 8 AIS Use Case

Large-file Use Cases (KB to GB) ●Digital Forensics XML (DFXML) ●OpenStreetMap 9 ●Packet Description Markup Language (PDML)

EXI performs well on large files, aggregation benefits plateau 10 PDML Use Case

EXI and MS Office ●Microsoft Office is ubiquitous in Navy/DoD ●Since 2003, the file format has been a Zipped archive of many small XML files ●Since 2006, the file format has been an open standard ●Since 2013, MS Office 365 can save in compliant format ●Tools such as NXPowerLite target excess image resolution and metadata to shrink them ●EXI can target the remainder... 11

(Images removed from all files) 12 Microsoft Office Use Case

Tuning data, XML schema and EXI codec on a per-application basis maximizes benefits Conclusions ●When to send? ○ Aggregating data improves performance ○ Balance with operational requirements ●EXI configurations are significant ●XML Schema is significant ○ Previously a tool for data validation, now a tool for compression ●EXI is generally more compact than JSON-based binary encodings ●EXI performs well on large files 13

Next Steps ●Holistic Profiling ○ Optimizing EXI encodings is a multi-dimensional problem ●Need for Best Practices ○ How to make sure we’re getting the best performance possible for EXI? ○ Rethinking XML and associated schemas a must ●Expanding EXI to the Open Web Platform ○ HTML5, CSS, JavaScript, JSON, SVG are the building blocks of tomorrow’s applications, distributed over networks ○ All are targets for EXI-like compression techniques ●Fleet Adoption ○ Open source EXI codec on every desktop and server 14