Microsoft Office and XML – Making the data work for YOU! Mark Johnston Developer & Platform Group, Microsoft Ltd
Agenda Open XML file formats Solution development using Open XML Document Information Panel Content Controls Custom XML Bringing it all together
The vision! Office 2007 Business Processes SharePointInfoPathDatabases 3 rd Party Tools
File formats “The nuts and bolts”
Evolution of File Formats Microsoft Office 97 Existing binary file formats designed in 1994, launched in Microsoft Office 97 Microsoft Office XP First XML Format Spreadsheet XML Microsoft Office 2003 Breakthrough XML Support WordML, SpreadsheetML Custom-defined schema Microsoft Office 2007 New XML Formats XML file format default XML PowerPoint format System 2007 Microsoft Office 2000 Early Innovation XML document properties
Open XML File Formats Default file formatDefault file format More efficient storageMore efficient storage Program against full document contentsProgram against full document contents Easy document assembly, conversion & integrationEasy document assembly, conversion & integration Backward compatibility & legacy supportBackward compatibility & legacy support Approved to ECMA for standardizationApproved to ECMA for standardization
Open XML Formats Architecture User view: Single file Questionnaire.docx Document Parts Most parts are XML Each XML part is a discreet, compressed component Can add, extract and modify individual parts without using Office programs Corruption or absence of any part would not prohibit the file from being opened Developer view: Modular file Document properties File container Comments WordML/SpreadsheetML, etc. Custom-defined XML Images, video, sound Embedded code/macros Charts
Components of the New Formats Package – ZIP Container Part – The “files” inside the ZIP Content Types – are enforced on open Document Properties Application Properties Custom Doc.Props. Workbook Sheet2 Sheet3 Sheet1Styles Chart Strings......
Open XML File Formats
Tools for Accessing Data In Office Open XML files ZIP ManipulationZIP Manipulation –Compressed Folders in Windows? –Third-Party Zip Libraries –Microsoft’s Packaging API’s XML EditingXML Editing –Notepad? –System.XML makes this easier Office Open XML Resource KitOffice Open XML Resource Kit –Code Snippets C# and VB.NETC# and VB.NET –Validation Library Parses a file and reports on schema, relationship errors and warningsParses a file and reports on schema, relationship errors and warnings –Serialization/De-serialization Library Flattens package into a single file for ease of development in simple construction scenariosFlattens package into a single file for ease of development in simple construction scenarios
System.IO.Packaging Part of.NET Fx 3.0Part of.NET Fx 3.0 Allows you toAllows you to –Create / Open packages –Create and delete parts and relationships –Read and write part streams –Iterate through collections of parts and relationships PackagePackagePartCollection └ PackagePart └ PackagePartPackageRelationshipCollection └ PackageRelationship └ PackageRelationshipPackUriHelper
System.IO.Packaging.Package Package class provides methods to create, enumerate and delete the following entitiesPackage class provides methods to create, enumerate and delete the following entities –Package –Package Relationships –PackageProperties –Parts Package Relationships Core Properties Common Package Parts Thumbnail Digital Signatures officeDocument XML Part Specific Format Parts Etc… Part Rels XML Part Part Rels
System.IO.Packaging.Relationship Relationships tie the parts togetherRelationships tie the parts together Required to find parts (part names are not guaranteed)Required to find parts (part names are not guaranteed) Iterate through RelationshipCollection by Type or IDIterate through RelationshipCollection by Type or ID Relationship PropertiesRelationship Properties –ID –Package –RelationshipType –SourceUri –TargetMode –TargetUri Package Relationships Core Properties Common Package Parts Thumbnail Digital Signatures officeDocument XML Part Specific Format Parts Etc… Part Rels XML Part Part Rels officeDocument XML Part
System.IO.Packaging.PackagePart Parts are the objects of data within the PackageParts are the objects of data within the Package PackagePart provides support to create, enumerate and delete part relationshipsPackagePart provides support to create, enumerate and delete part relationships Get Part data as StreamGet Part data as Stream PackagePart Properties:PackagePart Properties: –CompressionOption –ContentType –Package –Uri Package Relationships Core Properties Common Package Parts Thumbnail Digital Signatures officeDocument XML Part Specific Format Parts Etc… Part Rels XML Part Part Rels <w:body> The Quick Brown Fox jumped over the river. The Quick Brown Fox jumped over the river. … … XML Part <w:body> The Cow jumped over the moon. The Cow jumped over the moon. … …
Scenarios “What to do?”
Document Interrogation Scenarios When you need meta-data about Office files on a serverWhen you need meta-data about Office files on a server Building reports from data in filesBuilding reports from data in files Workflow and Content Management scenariosWorkflow and Content Management scenarios –Validate compliance
Document Assembly Scenarios Useful when documents need to be generated from structured dataUseful when documents need to be generated from structured data –Auto generate reports in Excel from data in database –Create documents for users from form data –Repurpose existing data (slide libraries) Recommendation: Start from a templateRecommendation: Start from a template
Document Sanitization Scenarios SecuritySecurity –Remove active content (VBA, ActiveX) PrivacyPrivacy –Remove comments, revisions, hidden text –Remove or alter document properties LegalLegal –Insert copyrights, watermarks, images Run as part of Workflows, publishing, compliance scenariosRun as part of Workflows, publishing, compliance scenarios
Solution Development
Pseudo Code/Workflow for Sanitizer GetImages() Open Package Grab presentation. xml Create XmlDoc Load Package stream Grab node with Image List using XPath Count and return
Structure and control “do you not trust your end-users?”
Document Information Panel Customisable form displayed in the client application Allows users to enter document properties (metadata) while working on the document SharePoint properties appear as metadata in the DIP
Document Information Panel
Custom Information Panel The Document Information Panel uses InfoPath technology This technology can be used to create business logic around any custom XML data –All the power of InfoPath in Word, etc. –Data connections, declarative rules engine, etc.
Custom Information Panel using InfoPath technology
The Role of XML Reference and Custom-defined Schemas Custom-defined Schemas Data-oriented (e.g.: Price, Invoice) business information Enable System Integration XML Reference Schemas Display-oriented (Bold, Italics, Tables, Paragraphs, Styles,…) Document Format Enable Archival and File Formats Interoperability
The Role of XML Reference and Custom-defined Schemas <w:p> John Doe John Doe Health Agency Health Agency </w:p> XML Reference Schemas Display-oriented (for example, Bold, Italics, Tables, Paragraphs, Styles) Document Format Enable Archival and File Formats Interoperability
The Role of XML Reference and Custom-defined Schemas Custom-defined Schemas Data-oriented (for example, Price, Invoice) business information Enable System Integration <ConferenceReport> 3/24/2004 3/24/2004 Health Agency % 25% …
Content Controls (Word) Makes structured documents more robust & much less Word specific code needed –Content restrictions, grouping & locking –Code can be used for business logic! End user friendly and layout independent exposure of structured content No XML schema required –Evolution of customer-defined XML with custom XML mapping capabilities
Content Controls
Office XML Data Store Customer-defined XML (incl. WSS/Office properties) stored separately from WordprocessingML as a part in Open XML format Any XML can be stored (with or without XML schema) XML data is available as an editable tree (using familiar DOM) within Word Can be populated on a server using Fx3.0 or a client using Word OM
XML Mapping Link content controls to nodes in the XML data store Provides for true data/view separation model in Word/Excel/PowerPoint Mappings are created using standard XPath expressions Mappings can be set up to ‘auto-attach’ to incoming data Out of the box support for mapping to Office properties
<Attendees> Health Agency Health Agency …</Attendee> Content Controls + Custom XML
Dynamic Documents
Putting it all together Start with a Template Document Assembly –System.IO.Packaging –Delete / Create Parts & Relationships Document Manipulation –System.IO.Packaging –Word -> Content Controls + Custom XML –Excel -> System.Xml + SpreadsheetML –PowerPoint -> System.Xml + PresentationML
Server-side Demo
CONCLUSION Open XML enables complete and open access to Office files –System.IO.Packaging Document Information Panels streamline input of data –Power of InfoPath in Word Custom XML and Content Controls allow structured editing and true data/view separation
Additional Resources Open XML File Formats – – General – – –
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.