Jim Gray Researcher Microsoft Research

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Trying to Use Databases for Science Jim Gray Microsoft Research
Gigabyte Bandwidth Enables Global Co-Laboratories Prof. Harvey Newman, Caltech Jim Gray, Microsoft Presented at Windows Hardware Engineering Conference.
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Online Science the New Computational Science Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
Building a Framework for Data Preservation of Large-Scale Astronomical Data ADASS London, UK September 23-26, 2007 Jeffrey Kantor (LSST Corporation), Ray.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Presentation to Baltimore-Washington DC Metro MPUG Chapter September 22, 2009.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
The ICDP Information Network Telework and Information Management in Scientific Drilling Projects Jens Klump and Ronald Conze GeoForschungsZentrum Potsdam.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
The Data Avalanche Jim Gray Microsoft Research Talk at HP Labs/MSR: Research Day July 2004.
NVO Review -- San Diego Jan The VO compared to Other O‘s Jim Gray Microsoft T HE US N ATIONAL V IRTUAL O BSERVATORY.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
1 Where The Rubber Meets the Sky Giving Access to Science Data Jim Gray Microsoft Research Alex.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
Network and Server Basics. Learning Objectives After viewing this presentation, you will be able to: Understand the benefits of a client/server network.
Hall D Computing Facilities Ian Bird 16 March 2001.
Dan Fay Technical Computing Microsoft
Understanding and Improving Server Performance
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Meemim's Microsoft Azure-Hosted Knowledge Management Platform Simplifies the Sharing of Information with Colleagues, Clients or the Public MICROSOFT AZURE.
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Evolution of the E Logbook
Nicho Joins Microsoft Azure Certified Program to Transform Brand Engagement, Boost Customer Acquisition and Conversions with Scalable Ease MICROSOFT AZURE.
Applications Using the EGEE Grid Infrastructure
Deployed on Microsoft Azure, ecManager Provides E-Business Retailers and Brand Manufacturers with a Dependable Omnichannel E-Commerce Platform MICROSOFT.
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Partner Readiness Guide Cloud Application Development
+Vonus: An Intuitive, Cloud-Based Point-of-Sale Solution That’s Powered by Microsoft Office 365 with Tools to Increase Sales Using Social Media OFFICE.
Patrick Dreher Research Scientist & Associate Director
Pack Your Park by Modernizing Your Business Online
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Jim Gray Alex Szalay SLAC Data Management Workshop
Big Red Cloud Offers a Simple Online Accounts Solution for Business Owners and Bookkeepers Hosted on the Powerful Microsoft Azure Platform MICROSOFT AZURE.
BARC Scaleable Servers
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Data Security for Microsoft Azure
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
Grid Data Integration In the CMS Experiment
Built on the Powerful Microsoft Office 365 Platform, My Intranet Boosts Efficiency with Support of Daily Tasks, Internal Communications and Collaboration.
ExaO: Software Defined Data Distribution for Exascale Sciences
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
Office 365 and Microsoft Project Integrations for HULAK Project Management Software Enable Teams to Remain Productive and Within Budget OFFICE 365 APP.
Pack Your Park by Modernizing Your Business Online
XtremeData on the Microsoft Azure Cloud Platform:
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
Project Information Management Jiwei Ma
Jim Gray Microsoft Research
The Worldwide LHC Computing Grid
Using an Object Oriented Database to Store BaBar's Terabytes
Federated Hierarchical Filter Grids
Google Sky.
McGraw-Hill Technology Education
The UltraLight Program
COMPANY PROFILE: REELWAY
SANDIE: Optimizing NDN for Data Intensive Science
Presentation transcript:

Jim Gray Researcher Microsoft Research Microsoft Research CERN-Pasadena at 1 GBps (8 Gbps) World Wide Telescope Jim Gray Researcher Microsoft Research

Microsoft Research Organizational goal: Advance state of the art More than 700 staff, 55 areas Labs in US, Europe, Asia Internationally recognized teams University organizational model Open research environment Close ties to universities Close working relations with development. Founded in 1991 to pursue technologies that are of strategic importance to MS’s future. We hire the best and the brightest and we’re all deeply dedicated to working closely with MS product groups Staff of over 700 in over 55 areas Internationally recognized research teams Organizational goal: Advance state of the art University organizational model Flat structure, critical mass groups Open research environment Aggressive publication in peer-reviewed literature Frequent visitors, daily seminars Strong ties to University Research Nearly 15% of basic research budget directly invested in Universities Lab grants, research grants, fellowships, etc. Hundreds of interns and visitors

? My Research Goal Information at your fingertips Bring all scientific literature and data online Focus on large database issues, and scalable servers. Experiments & Instruments Simulations facts answers questions ? Literature Other Archives Data ingest Managing a petabyte Common schema How to organize it? How to reorganize it How to coexist with others Query and Vis tools Support/training Performance Execute queries in a minute Batch query scheduling

Challenge: Move Data from CERN to Remote Centers @ 1GBps Disk-to-Disk gigabyte / second data rates 80TB/day 30 petabytes by 2008 1 exabyte by 2014 ~5 GBps CERN Filter Tier 2 Tier 3 Tier 1 … INP3 RAL INFN FNAL Institute Tier 4 Experiment ~1 GBps ~PBps .1 GBps Physics data cache Workstations Harvey: We developed a four tiered architecture to support these collaborations. Data processed at CERN is distributed among the Tier1 national centers for further analysis. The Tier1 sites act as a distributed data warehouse for the lower tiers, and refine the data by applying the physicists’ latest algorithms and calibrations. The lower tiers are where most of the analysis gets done. They are also a massive source of simulated events. Jim: As a database guy, I am scared of managing 20 petabytes, that’s hundreds of thousands of disks. Yes, and as the LHC intensity increases, the accumulation rate will increase, and we expect to reach an Exabyte stored by 2013-15. All the flows in this picture are designated in gigabytes per second; so it’s clear why we need a reliable gigabyte per second network; And our bandwidth demand is accelerating, along with many other fields of science; growing by a factor of two each year or 1000-fold per decade; Much faster than Moore’s Law. . We have to innovate each year, learning to use the latest technologies effectively just to keep up. OC192 = 9.9 Gbps Graphics courtesy of Harvey Newman @ Caltech

Current Status: CERN → Pasadena Multi Stream tpc/ip 7.1 Gbps ~900 MBps New speed record @ http://ultralight.caltech.edu/lsr-winhec/ Single Stream tpc/ip 6.5 Gbps ~800 MBps File Transfer Speed ~450 MBps 7,000 6,000 5,000 4,000 Jim: Internet2 is a consortium of universities building the Next Generation Internet. They defined a speed contest to recognize work on high-speed networking. Microsoft entered and won the first round of this contest, but we have not entered since. Indeed, Harvey’s team at Caltech has been setting the records over the last two years. Harvey: To set the records, including the one just certified by Internet2, we used the network shown on the previous slide and out-of-the-box tcp/ip to move data from CERN to Pasadena. Using Windows on Itanium2 servers with Intel and S2io 10 Gigabit Ethernet cards, and Cisco switches, we reached 6.25 gigabits per second – or just under 800 megabytes per second. mbps per second 3,000 2,000 1,000 2000 2001 2002 2003 2004 2005

World Wide Telescope Premise: Most Astronomy data is online The Internet is the world’s best telescope It has data on every part of the sky In every measured spectral band: As deep as the best instruments It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). It’s a smart telescope: links objects and data with literature.

SkyServer.SDSS.org Built with Johns Hopkins U. A modern archive Raw data in file servers Catalog data (derived objects) in Database 10 billon records, 2 TB Online query to any and all Also used for education 150 hours of online Astronomy Implicitly teaches data analysis Interesting things Based on Web Services Spatial data search Cloned by other surveys (a design template)

Service Oriented Architecture Data Federations of Web Services Massive datasets live near their owners: Near instrument software pipeline, apps Near data knowledge and curation Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Uniform access to multiple Archives A common global schema Scientists get “personalized” extracts DB DB DB DB DB

Federation: SkyQuery.Net Combines 15 archives Send query to portal, portal joins data from archives. Problem: want to do multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: “batch scheduler” on portal server, Deposits answer in personal database.

SkyQuery Structure Each SkyNode publishes Schema Web Service Data Query Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service 2MASS INT SDSS FIRST SkyQuery Portal Image Cutout

Summary Microsoft Research is active inside and outside Microsoft. 10Gbps Networking is coming, x-64 is coming and we are investing to make them real. World Wide Telescope is coming Exemplifies service oriented architecture Built with web services and databases Has interesting spatial database algorithms Details on my website: http://research.microsoft.com/~Gray