Database Farming For Improved Performance Presented By: Russell Yong Supervisor: Prof Wentworth
Problem at Hand Database solution for a large corporation Expensive software (Oracle Database Enterprise Edition +- US $40k) Top-end hardware Microsoft’s SQL Server 2000 Not same level of confidence
Solution Adapt the popular technique of backend server farming Apply it to databases – to create a high performance database web service Backend setup being invisible to the user
Hypothesis Technique will create a more cost effective database farm Eradicate some problems associated with dealing with large databases
Our Plan Standard 3-Tier Model
Our Plan Adapted 3-Tier Model
Conceptually Web-server Farm of DatabasesWeb ServiceClients Pool of Connections http request DataSet Multiple Threads of Execution DataSet
DataSet Object In-memory cache of data Comparable to a mini-database Multiple tables Relationships Constraints
DataSet Object Easily serialized into and back out of XML Structure (tables, columns, etc) described in an XML schema View and manipulate using either relational or XML methods (unified programming model) Compatible with other XML speaking applications
DataSet Object Disconnected Model Sub-queries fill individual datasets Collector Object Collect and merge individual sub-queries Returned to the client
Typed DataSet Has an implicit schema Allows for more efficient filling Faster access Created via Form Designer, programmatically, or at run time via XSD
XSD File
Implications for Web-Applications Resource sensitive approach “Bulk” approach to communication Access local cache Ideal for non-volatile data
Implications for Web-Applications Optimistic concurrency model Most applications ? Improved performance (no locking) No persistent connection required (resources) Minimize required server resources Connections used more effectively Exceptions are dealt with accordingly
Our Database Excess of “10 Million Records” Network traffic information Partitioned in 10 segments Initial difficulty Distributed over 3 machines (SQL Server 2000) Simulating a completely distributed environment
Data Providers SQL Server.NET Data Providers SqlConnection SqlDataAdapter SqlCommand OLE DB.NET Data Providers ODBC.NET Data Providers (separate download)
Data Providers
Our Framework
MyQueryHandler Farming Layer An instance for each individual user query Distributor (spawns threads) Collector, merging DataSets as they return All encompassing DataSet Pluggable
MyThreadHandler Represents individual threads Fills separate DataSets for each of the partitions in the farm Returns DataSet to QueryHandler Pluggable
Specifying Queries Couple queries hard-coded Defined according to a parameter Future Extensions…
Tests and Results Ran queries 100 times Gauge mean Filter out any possible influencing factors Influencing factors Network traffic Active machines
Testing and Results Simple query “SELECT * FROM ping WHERE (ip = ) OR (ip = ) OR (ip = ' ') OR (ip = ' ')” Returning rows Farming Method Averaged 35 seconds Normal Method Averaged 94 seconds
Testing and Results “SELECT * FROM ping WHERE (ip = ) OR (ip = ) OR (ip = ' ') OR (ip = ' ')”
Hypothesis Technique will create a more cost effective database farm Technique will create a more cost effective database farm Eradicate some problems associated with dealing with large databases Eradicate some problems associated with dealing with large databases
Possible Extensions Full access to DB via HTTPS Front-end Query construction wizard Investigate partitioning techniques “Intelligent” querying
Questions ?