Content Challenges for Open Government Dale Waldt Sr. Analyst / Consultant
2 Content Challenges for Open Government High volume / aggregation Complexity / heterogeneous formats Complex content integration and delivery Timing / updating / currency of information
3 High-Volume / Heterogeneous Content A federal agency in the US maintains an extremely large records archive Petabytes of content, constantly updated from hundreds of sources Diverse formats / document types Mix of structured / unstructured content HTML, PDF, Word, CSV, Binaries, RDMS, etc. Not feasible / allowed to "normalize" into consistent format for storage, indexing, searching, delivery
4 High-Volume / Heterogeneous Content 100 PB DB Search Indexing Index Metadata Crawler Metadata & References 100 GB Query Handler
5 High-Volume / Heterogeneous Content Challenges Difficult to index for search Diverse data formats / lack of transparency Opportunities X-Query provides flexible access into diverse content Crawlers harvest metadata / build index into content Web applications access standard metadata / WS API
6 Content Integration / Delivery Mass.gov 300,000 pages 240+ contributors 8+ centralized production team Specialized audience views Urgent news / news feed Task-specific information Content by agency
7 Content Integration / Delivery Metadata enrichment to automate creation of views Easy to enforce a taxonomy Feeds automated search/query processes Aids dynamic assembly
8 Content Integration / Delivery Challenges Maintaining consistent look and feel, navigation Manual build lists maintained for each "view" not scalable Infrequent contributors cannot master complex editing tools Opportunities Metadata to support dynamic assembly / search views Controlled vocabulary to organize information
9 Real-Time Updating Iowa Legislature Bills/amendments/ laws/statutes content Tracking info / dates Real-time updates Links to related content Historical information & versions
10 Real-Time Updating How a Bill Becomes a Law…
11 Real-Time Updating Challenges Automated processes needed to support volume / real-time updates Aging tools need updating Opportunities Metadata to integrate related content Workflow designed to capture / report actions / content versions Query tools for accessing real-time reporting information / content
12 Lessons Learned Robust data architecture enables robust information delivery Legacy data / systems need updating Search tools need metadata for custom views Process needs automation for scalability Users need simple tools that produce rich content
13 The Role of Standards Data models for content processing & validation Taxonomies for classification & reorganization Interoperability of shared content repositories Transformation & rendering of content Processes & policies for consistency & governance "Standards leverage and communicate the work of others to reduce development time and increase accuracy of content."
14 Resources Gilbane publications Enabling the Promise of Open Government: Addressing Large-Scale Integration, Storage, and Access of Complex Information Content Management Interoperability Services (CMIS): Addressing Contemporary Requirements for Content Integration Download for Free
15 Questions?