Key Challenges in Information Processing James Hamilton Microsoft SQL Server 2002.03.01.

Key Challenges in Information Processing James Hamilton JamesRH@microsoft.com Microsoft SQL Server 2002.03.01

2 Unsolved Challenges 1. Availability shows only incremental progress 2. Security broken & too hard to manage 3. Weakly structured data poorly supported or exploited 4. Writing Multi-tiered apps too hard  Data intensive mid-tiers need more DB help 5. Scalability over perf & big-iron

3 Availability: Largely unsolved problem  1985 Tandem study (Gray):  Administration: 42% downtime  Software: 25% downtime  Hardware 18% downtime  1990 Tandem Study (Gray):  Software 62%  Administration: 15%  Most studies have admin contribution much higher  Observations:  H/W downtime contribution trending to zero  Software & admin costs dominate & growing  We’re still looking at 10 to 15 year-old research

4 Availability: Cost in dollars/hour  Brokerage operations$6,450,000  Credit card authorization$2,600,000  Ebay (1 outage 22 hours)$225,000  Amazon.com$180,000  Package shipping services$150,000  Home shopping channel$113,000  Catalog sales center$90,000  Airline reservation center$89,000  Cellular service activation$41,000  On-line network fees$25,000  ATM service fees$14,000 From Dave Patterson Talk at HPTS 2001 -- Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”... survey done by Contingency Planning Research."

5 Availability: Admin still the problem  Administrators expensive  Admin dominate H/W & S/W costs (5x or more)  Administrators make mistakes  Admin #1 or #2 cause of downtime  Big problem yet little research focus:  Still few data points available:  Most systems houses won’t publish... need research  No benchmarks:  Benchmarks drive industry & systems research  Goal: Server appliance model:  Auto-tuning, pluggable server-side resources  IBM SMART, Microsoft index tuning wizard, etc.  Dave Patterson, Aaron Brown, Armando Fox,...  More help needed

6 Availability: the S/W is broken  Even server-side software is BIG:  Windows2000: over 50 mloc  DB: 1.5+ mloc  SAP: 37 mloc (4,200 S/W engineers)  Tester to Developer ratios above 1:1  Quality per unit line only incrementally improving  Current massive testing investment not solving problem  New approach needed:  Assume S/W failure inevitable  Redundant, self-healing systems right approach  Tandem process-pair work good but getting fairly old... progress?

7 Security: Securing systems too hard  “Less than 0.0025% of corp revenue invested in security” – Richard Clarke, Special security advisor to president  Data loss, intentional data & systems corruption  Clearly under-reported problem  S/W Vulnerabilities rampant:  Buffer overruns, stack smashing, code insertion, SQL insertion, elevation of privs,...  Programmers being more careful doesn’t solve problem  Most systems miss-configured:  Security systems too complex & hard to admin  Research needed: Autonomous threat detection  better tools to detect, correct, & prevent S/W security vulnerabilities  Monitor all measurable system metrics:  Detecting new threats & miss-configurations  Track execution profiles: detect changes: drive alerts, auto-config, reports to vendor, upgrade s/w,...

8 Unstructured Data: Mostly not stored in DB  All data has some schema but not always fully known nor affordable to pre-declare:  Most data in unstructured stores with text search  DB community is losing  Much research work on XML focused upon:  Mapping XML to relational scheamas  leverages existing relational IQ but not as flexible  New, non-relational (native XML) stores  Storing natively doesn’t leverage DB investment  Mostly mid-tier data integration servers  Research potential:  Native stores leveraging existing infrastructure esp. cost- based optimizers, storage engines, & utilities  IR work progressing but little integration into DB  Integrating IR work into DB W/O required schema, ability to exploit if there, ability to discover/infer if not

9 Multi-tiered apps: we’re not helping  Many high scale multi-tiered apps still hand crafted  Needed: Object access layer, data cache, queuing, query compiler & optimizer, data directed routing, security,...  Problem not adequately solved by industry  Integration with server-tier DB advantages:  ACID relaxation driven by attributes on apps or data  Relaxed models with auto-cache population & mgmt  Query parsing for data directed routing  Want to parse once & accept same lang as backend  Exploit optimizer: model full mid-tier to back-end costs  Where to run joins, functions, aggs, etc.  Need security integration W/O fully provisioning backend  Data intensive mid-tiers are a DB & TP problem:  Solve with DB tech & integrate with backend DB  Componentized DB for mid-tier use one approach

10 Scalability: perf not the problem  Focus still on performance rather than scalability:  Clusters only “nearly” work  Must buy biggest iron & get most from it  Research goal: Server appliances  Gray’s servers by the brick  brick includes disk, memory, & CPU resources  Only admin actions required:  Add brick to, or defect from, cluster  Data redundancy (potentially) on geo-scale:  adapts to access patterns & available bandwidth  If zero-admin clusters actually worked & scaled:  performance would be a secondary issue  The admin problem would nearly go away  The S/W quality problem greatly simplified  Hiesenbugs solved via retry and redundancy  Would shift investment dollars from H/W & admin to S/W (where it belongs )

Key Challenges in Information Processing James Hamilton Microsoft SQL Server 2002.03.01.

Similar presentations

Presentation on theme: "Key Challenges in Information Processing James Hamilton Microsoft SQL Server 2002.03.01."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Key Challenges in Information Processing James Hamilton Microsoft SQL Server 2002.03.01.

Similar presentations

Presentation on theme: "Key Challenges in Information Processing James Hamilton Microsoft SQL Server 2002.03.01."— Presentation transcript:

Similar presentations

About project

Feedback