Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321.

Similar presentations


Presentation on theme: "Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321."— Presentation transcript:

1

2 Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321

3 Beyond Relational Data Dealing with relational and non-relational data platforms Growth in application complexity and duplicated functionality Compensating for unavailable services Pain Points Goals Reduce the cost of managing all types of data Simplify the development of applications which use relational and non-relational data Extend services currently available for relational data to non-relational data Provide non-relational services to relational data

4 Customer Quote “The non-relational features available in SQL Server 2008 allowed us to manage our large datasets composed of millions of images and associated metadata as single entities and eliminated our need to maintain CLR functions for geospatial calculations. Switching to SQL Server 2008 reduced our operational costs, increased our team productivity, and allowed us to focus on our core business of collecting and processing geospatial data.” – Telmo Sampaio, CTO – Blue Dasher Technologies

5 Session Objectives and Takeaways SQL Server 2008 has added improved support for a variety of non-relational data management scenarios You will understand what SQL Server 2008 will offer in addition to SQL Server 2005’s functionality to Store and manage unstructured data Store and manage semi-structured data You will better understand when to use which of the presented technologies

6 Beyond Relational Feature Overview SQL Server 2005 SQL Server 2008 Large UDTs Sparse Columns Wide Tables/Column Set Filtered Indices HierarchyID Relational BR Support User Defined Types Full Text Indexing Documents & Multimedia Remote BLOB Store API Filestream Integrated FTS Spatial Fully supported Geometry and Geography data types and Functions XML Data Type and Functions XML Upgrades XML

7 Street-level, location-based imagery Telmo Sampaio Chief Technology Officer Blue Dasher Technologies

8 Blue Dasher Application Overview Capture and provide street-level imagery for all public roads in the 50 top metropolitan areas in the U.S. Manage data with SQL Server 2008: Metadata: 70GB of data Image data: 50TB of data Metadata: Uses spatial GEOGRAPHY Type to pinpoint the location of the images, Spatial Index defined Image data: 3 sizes small (7 Kb) in file system medium (60 Kb) and large (1.3 Mb) stored using filestream Medium used for navigation

9 Blue Dasher-SQL Server benefits Geography data type and functions No need to manage.NET assemblies - use of STIntersects(), STDistance(), STBuffer() Ability to save different shapes in single GEOGRAPHY column Manageability Single backup/restore operations for metadata and images Replication support Performance Streaming files to client application faster using Win32 API

10 Documents and Multimedia Low cost per GB Streaming Performance Complex application development and deployment Integration with structured data Advantages Challenges Integrated management Data-level consistency Poor data streaming support File size limitations Highest cost per GB Lower cost/GB at scale Scalability & Expandability Complex application development and deployment Separate management Enterprise-scales only Example Windows File Servers NetApp NetFiler EMC Centera Fujitsu Nearline SQL Server VARBINARY(MAX) Documents & Multimedia Use File Servers DB ApplicationApplication BLOBs Dedicated BLOB Store DB ApplicationApplication BLOBs Store BLOBs in Database DB ApplicationApplication BLOBs

11 SQL Server 2008 BLOBs Remote BLOB Storage FILESTREAM Storage SQL BLOB Documents & Multimedia Use File Servers DB ApplicationApplication BLOB Dedicated BLOB Store DB ApplicationApplication BLOB Store BLOBs in Database DB ApplicationApplication BLOB Store BLOBs in DB + File System ApplicationApplication BLOB DB

12 Filestream Storage Attribute on VARBINARY(MAX) Works with integrated FTS Unstructured data stored directly in the file system (requires NTFS) Dual Programming Model TSQL (Same as SQL BLOB) Win32 Streaming APIs with T-SQL transactional semantics Data Consistency Integrated Manageability Back Up/Restore Administration Size limit is the file system volume size SQL Server Security Stack Documents & Multimedia Store BLOBs in DB + File System ApplicationApplication BLOB DB

13 FILESTREAM API // New TSQL Function: // Get_filestream_transaction_context() // SELECT Get_filestream_transaction_context() // New TSQL Function : // PathName() // SELECT ClaimImage.PathName() FROM Insurancedb..Claims

14 Managed SqlFileStream: READ // New SqlFileStream Class in VS08 SP1 // SqlFileStream sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Read); // output file to read into // System.IO.FileStream fs = new System.IO.FileStream ( "c:\\output2.jpg", System.IO.FileMode.Create); { byte[] buffer = new byte[512 * 1024]; int cbBytesRead = buffer.Length; while (cbBytesRead == buffer.Length) { cbBytesRead = sfs.Read(buffer, 0, buffer.Length); fs.Write(buffer, 0, cbBytesRead); }

15 Managed SqlFileStream: WRITE sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Write, 0); using (System.IO.Stream res = Pictures.GetResourceStream(HealthCare.MRI.JoeSmith)) //input { byte[] buffer = new byte[512 * 1024]; // 512KB int cbBytesRead = buffer.Length; while (cbBytesRead == buffer.Length) { cbBytesRead = res.Read(buffer, 0, buffer.Length); sfs.Write(buffer, 0, cbBytesRead); } // commit SQL transaction and close SQL connection. //

16 FileStream

17 Write Performance (Remote)

18 Read Performance (Remote)

19 Updates Using FILESTREAM File I/O Access “In place partial updates” not supported Update through Win32 handle (Write access) Done by creating a new file (0-bytes) New content replaces old (committed) content on Transaction Commit (example: high resolution pictures) For the new file to contain the committed BLOB content first use FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT Useful for LOB append

20 Some Tips… Reading bigger buffers gives better performance Volumes hosting FILESTREAM data should have 8.3 name generation and LastAccessTime disabled FILESTREAM containers to reside on dedicated volumes Have one volume per FILESTREAM container enables space management at volume level “Magic” SMB buffer size = ~60KB Another “good” value is 480KB ROWGUID unique index for aligned partitioning AntiVirus programs should be configured not to delete infected files but to quarantine them If using compressed volumes, use cluster size 4 KB

21 Migration SQL BLOBs to FILESTREAM Conceptual walkthrough Create one (or more) FILESTREAM (filegroup) Alter table to add FILESTREAM column For each existing row, update data in new row with empty FILESTREAM value Write LOB data (File I/O or T-SQL access) Optional: drop Varbinary(MAX) column BLOB migration utilities support (intended for SQL Server 2008 R2)

22 Customer Application Remote Blob Store Architecture Customer applications can transparently support different BLOB stores Each Remote Blob Store vendor responsible for delivering their own providers NetApp lib IBM lib Centera lib SQL RBS API NetApp IBM Centera SQL DB Provider API RBS Services Create Fetch GC Delete Documents & Multimedia

23 RBS Workflow ApplicationApplication RBS Client Library BLOB Store Provider Library BLOB Store SQL Server ClaimIDClaimDatePhotoRef43906/5/2007<Binary(20)> 1 2 3 1 Write BLOB(Photo) Return Blob ID Write Blob ID to PhotoRef field 2 3 Machine Boundary

24 RBS Fundamentals Most useful in environments where interoperability is required No restrictions on back-end store Back-end can change with no app change Looser (link level) consistency guarantees SQL Server handles link consistency and garbage collection

25 Unstructured Storage In SQL08 File Stores / External Blob Stores (CAS) SQL BLOBsRemote Blob API FILESTREAM Streaming Performance Depends on external store Link Level Consistency Data Level Consistency Integrated Management Non-local Windows File Servers n/a External Blob Stores n/a

26 Full Text Search Challenges Indexes stored outside SQL Server lead to manageability challenges Mixed query performance suffers from having to pull over complete full-text result set Scaling issues on big boxes Documents & Multimedia See also DAT303 – Answering the Queries your users really want to ask, with Full-Text Search in SQL Server 2008

27 Full Text Search Improvements Full-Text Engine and Indexes fully integrated Catalog, index and stopword lists now inside the database Better performance in many common scenarios Make mixed queries perform and scale Optimizer has knowledge about FT index X= selectivity (%) of rows that match the relational condition IFTS FTQuery time : (SQL 2005 FTQuery time) x (X/100) Exposing the FT Index content and any given WB behaviours New word breakers (WB) Better supportability tools From 23 to over 40 word breakers/locales Documents & Multimedia SELECT * FROM candidates WHERE CONTAINS(resume,’”SQL Server”’) AND ZipCode = ‘98052’

28 Doc_table iddoc_typedocument T-SQL Parser Algebrizer SQL Algeb. FTS Algeb. Algebrizer SQL Algeb. FTS Algeb. Bind Language Module FTS Algeb. Parse Ranking Func. Integration SQL SERVER processFDHOST process Shared Memory WB Stemmer iFilters STOPLIST THESAURUS WB client QO QE SQL/FTS integrated query tree FTLogicalOperator Cardinality FTExecutionOperator QUERY Execution Plan Full-Text Index Results IFTS Query Architecture CREATE FULLTEXT INDEX ft_idx ON Doc_table(document TYPE COLUMN doc_type) KEY INDEX doc_id_idx; Documents & Multimedia

29 Indexing Performance Populating an index of 20 million rows of 1K data on identical hardware (time in minutes) Documents & Multimedia 2 min1 min

30 Spatial Data Overview Storage and retrieval of spatial data using standard SQL syntax New Spatial Data Types (geometry, geography) New Spatial Methods (intersects, buffer, etc.) New Spatial Indexes Offers full set of Open Geospatial Consortium components (OGC/SQL MM, ISO 19125) Spatial Builder Interface SSMS Visualization Integration with Virtual Earth Spatial See also: DAT324 WiE: Building Location-Aware Services with Microsoft SQL Server DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support with Microsoft Virtual Earth

31 XML Improvements Improved XML Schema Validation Support for storing and validating Office 12 Document formats Support for lax validation Full xs:dateTime support Support for values without timezone timezone preservation Improved support for lists and union types Added support for let-clause in XQuery Added fn:upper-case()/fn:lower-case() Added support for insert sql:variable(“@xml”) into /a/b XML

32 XML Improvements

33 Semi-Structured Data Scenarios Property bag scenarios Distinct customized property sets associated with data Large number of unique properties, user annotations Examples Document management systems (SharePoint) Media stores Databases with heterogeneous record types in a table Type specific properties, inherited properties in a type hierarchy Examples Product catalogs (Commerce Server), Location/business specific properties(VE), etc… Relational BR Support

34 Semi-Structured Data Characteristics Large number of sparsely populated properties Distinct property sets Heterogeneous structures Sets, nested structures Relational BR Support

35 Semi-Structured Data Requirements Efficient storage for sparse properties Efficient relational access to sparse properties: Query, DML, Indexing Ability to get/set sparse property sets generically Index subsets of relevant properties for a property set Retrieve and analyze hierarchical data Relational BR Support

36 Semi-Structured Data SQL Server 2008 features Sparse Columns: Optimized storage for sparse columns Column Sets/Wide Tables: Support thousands of sparse columns Filtered Indexes: Ability to index a subset of rows in a table Hierarchy ID: System CLR type for hierarchical organization of data XML: For fast dynamic evolution (open schema), lists and tree objects Relational BR Support

37 Hierarchical Data Scenarios Forum and mailing list threads Business organization charts Content management categories Product categories Files/folders management SQL Server 2005 Parent/Child column XML datatype New in SQL Server 2008 HierarchyID type Relational BR Support

38 HierarchyID Key properties A system data type with variable length CLR UDT Microsoft.SqlServer.Types Varbinary encoding ( < 900 bytes) To represent position in a hierarchy Logically encodes the path from the root of the tree to a node Rich built-in methods for manipulating hierarchies Simplifies storage and querying of hierarchical data Comparison a<b is in depth-first order Support for arbitrary insertions and deletions / /1/ /1/1/ /1/1/1/ /1/1/2/ /1/2/ /2//3/ /3/1//3/2/ Relational BR Support See also: DAT04-INT Using the HIERARCHYID Datatype in Microsoft SQL Server 2008 to Maintain and Query Hierarchies

39 Sparse Columns “Sparse” as a storage attribute on a column Storage Optimization: 0 bytes stored for a NULL value Co-location of data: Performance benefits NULL Compression in the TDS layer No change in Query/DML behavior Relational BR Support

40 Sparse Columns Wide Tables/Column Sets Large number of sparse columns allowed in a table (30,000 Columns, 1000 indexes, 30,000 statistics) Requires defining a “Sparse Column Set” An un-typed XML column, with a published format Logical grouping for all sparse columns in a table Select * returns all non-sparse-columns, sparse column set (XML) Allows generic retrieval/update of all sparse columns as a set Relational BR Support

41 Sparse Columns and Filtered Indexes

42 Sparse Column Storage 0 Bytes stored for NULL Values ~20% CPU overhead for non-null value access Additional 2- 4 bytes for non-null values Sparse columns are beneficial when space savings >40% PKPK C1C1 1A(sc1,sc9)(1,9) 2B(sc2,sc4)(2,4) 3C(sc6,sc7)(6.7) 4D(sc1,sc5)(1,5) 5E(sc4,sc8)(4,8) 6F(sc3,sc9)(3,9) 7G(sc5,sc7)(5,7) 8H(sc2,sc8)(2,8) 9I(sc3,sc6)(3,6) sc 1 sc 2 sc 3 sc 4 sc 5 sc 6 sc 7 sc 8 sc 9 PKc1sc 1 sc 2 sc 3 sc 4 sc 5 sc 6 sc 7 sc 8 sc 9 1A19 2B24 3C67 4D15 5E48 6F39 7G57 8H28 9I36 Non-sparse storage Sparse storage Relational BR Support

43 Filtered Indexes Filtered Indexes and Statistics Indexing a portion of the data in a table Filtered/co-related statistics creation and usage Query/DML Optimization to use filtered indexes and statistics Restrictions Simple limited grammar for the predicate Only on non-clustered indexes Benefits Lower storage and maintenance costs for large number of indexes Query/DML performance benefits: IO only for qualifying rows Relational BR Support

44 Summary SQL Server 2008 will make it easier to create information-centric applications that require Unstructured documents XML Semi-structured information Combine the above with relational data by: Reducing the cost of managing all types of data Simplifying the development of applications which use relational and non-relational data Extending services currently available for relational data to non-relational data

45 Future of Beyond Relational Rich unstructured data Enable existing BR apps to store data in SQL Server Win 32 File I/O API compatibility Better integration of FileStream and RBS programming models Better scalability of FileStream Property Search and promotion iFTS improvements in functionality and scale/performance Deep Spatial More functionality Across BI components

46 Resources WebLog http://sqlblog.com/blogs/michael_rys/ Whitepapers & Videos FileStream/RBS Whitepapers: http://msdn.microsoft.com/en-us/library/cc949109.aspx http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manage- unstructured.aspx What’s new for XML in SQL Server 2008: http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new- xml.aspx http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new- xml.aspx iFTS: http://msdn.microsoft.com/en-us/library/cc721269.aspxhttp://msdn.microsoft.com/en-us/library/cc721269.aspx Semistructured data: To be published soon (see blog for announcement) PDC Presentation on Beyond Relational: http://channel9.msdn.com/pdc2008/BB16/ http://channel9.msdn.com/pdc2008/BB16/ And of course Books Online! SQL Server 2008 Business Value Calculator: http://www.moresqlserver.com

47 SQL Server Community Resources Become a FREE PASS Member: www.sqlpass.org/RegisterforSQLPASS.aspxwww.sqlpass.org/RegisterforSQLPASS.aspx Learn more about the PASS organization www.sqlpass.org/www.sqlpass.org/ Additional Community Resources SQL Server Community Center www.microsoft.com/sqlserver/2008/en/us/community-center.aspxwww.microsoft.com/sqlserver/2008/en/us/community-center.aspx TechNet Community for IT Professionals http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx Developer Center http://msdn.microsoft.com/en-us/sqlserver/bb671064.aspx SQL Server 2008 Learning Portal http://www.microsoft.com/learning/sql/2008/default.mspx http://www.microsoft.com/learning/sql/2008/default.mspx Connect: Local Chapters, Special Interest Groups, Online Community Share: PASSPort Social Networking, Community Connection Event Learn: PASS Summit Annual Conference, Technical Articles, Webcasts More about the PASS organization www.sqlpass.org/www.sqlpass.org/ Connect: Local Chapters, Special Interest Groups, Online Community Share: PASSPort Social Networking, Community Connection Event Learn: PASS Summit Annual Conference, Technical Articles, Webcasts More about the PASS organization www.sqlpass.org/www.sqlpass.org/ The Professional Association for SQL Server (PASS) is an independent, not-for-profit association, dedicated to supporting, educating, and promoting the Microsoft SQL Server community.

48 SQL Server Word of the Day POLICY-BASED MANAGEMENT Monday, May 11 *Game cards may be picked up at the SQL Server booths in the TLC

49 mrys@microsoft.com

50 Related Content DAT324 WiE: Building Location-Aware Services with Microsoft SQL Server, Ed Katibah, Olivier Meyer Fri 5/15 | 1:00 PM-2:15 PM | Room 402 DAT320 Optimizing Microsoft SQL Server 2008 Applications Using Table Valued Parameters, XML, and MERGE, Tobias Ternstrom Fri 5/15 | 9:00 AM-10:15 AM | Room 153 DAT303 Answering the Queries Your Users Really Want to Ask, with iFTS in Microsoft SQL Server 2008, Greg Low Tue 5/12 | 4:30 PM-5:45 PM | Room 404 DAT403 What's New in Microsoft SQL Server 2008, Stephen Forte Mon 5/11 | 1:00 PM-2:15 PM | Room 151

51 Related Content DAT04-INT Using the HIERARCHYID Datatype in Microsoft SQL Server 2008 to Maintain and Query Hierarchies, Itzik Ben-Gan Mon 5/11 | 1:00 PM-2:15 PM | Blue Thr 1 PAN56 SQL Server Server Programmability Q & A Tue 5/12 | 10:00AM | 501C DAT03-HOL Integrating Microsoft SQL Server 2008 Spatial Support with Microsoft Virtual Earth

52 www.microsoft.com/teched Sessions On-Demand & Community http://microsoft.com/technet Resources for IT Professionals http://microsoft.com/msdn Resources for Developers www.microsoft.com/learning Microsoft Certification & Training Resources Resources

53 Complete an evaluation on CommNet and enter to win!

54 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Michael Rys Principal Lead Program Manager Microsoft Corp. Session Code: DAT321."

Similar presentations


Ads by Google