Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jim Gray Researcher Microsoft Research

Similar presentations


Presentation on theme: "Jim Gray Researcher Microsoft Research"— Presentation transcript:

1 Jim Gray Researcher Microsoft Research
Microsoft Research CERN-Pasadena at 1 GBps (8 Gbps) World Wide Telescope Jim Gray Researcher Microsoft Research

2 Microsoft Research Organizational goal: Advance state of the art
More than 700 staff, 55 areas Labs in US, Europe, Asia Internationally recognized teams University organizational model Open research environment Close ties to universities Close working relations with development. Founded in 1991 to pursue technologies that are of strategic importance to MS’s future. We hire the best and the brightest and we’re all deeply dedicated to working closely with MS product groups Staff of over 700 in over 55 areas Internationally recognized research teams Organizational goal: Advance state of the art University organizational model Flat structure, critical mass groups Open research environment Aggressive publication in peer-reviewed literature Frequent visitors, daily seminars Strong ties to University Research Nearly 15% of basic research budget directly invested in Universities Lab grants, research grants, fellowships, etc. Hundreds of interns and visitors

3 ? My Research Goal Information at your fingertips
Bring all scientific literature and data online Focus on large database issues, and scalable servers. Experiments & Instruments Simulations facts answers questions ? Literature Other Archives Data ingest Managing a petabyte Common schema How to organize it? How to reorganize it How to coexist with others Query and Vis tools Support/training Performance Execute queries in a minute Batch query scheduling

4 Challenge: Move Data from CERN to Remote Centers @ 1GBps
Disk-to-Disk gigabyte / second data rates 80TB/day 30 petabytes by 2008 1 exabyte by 2014 ~5 GBps CERN Filter Tier 2 Tier 3 Tier 1 INP3 RAL INFN FNAL Institute Tier 4 Experiment ~1 GBps ~PBps .1 GBps Physics data cache Workstations Harvey: We developed a four tiered architecture to support these collaborations. Data processed at CERN is distributed among the Tier1 national centers for further analysis. The Tier1 sites act as a distributed data warehouse for the lower tiers, and refine the data by applying the physicists’ latest algorithms and calibrations. The lower tiers are where most of the analysis gets done. They are also a massive source of simulated events. Jim: As a database guy, I am scared of managing 20 petabytes, that’s hundreds of thousands of disks. Yes, and as the LHC intensity increases, the accumulation rate will increase, and we expect to reach an Exabyte stored by All the flows in this picture are designated in gigabytes per second; so it’s clear why we need a reliable gigabyte per second network; And our bandwidth demand is accelerating, along with many other fields of science; growing by a factor of two each year or 1000-fold per decade; Much faster than Moore’s Law. . We have to innovate each year, learning to use the latest technologies effectively just to keep up. OC192 = 9.9 Gbps Graphics courtesy of Harvey Caltech

5 Current Status: CERN → Pasadena
Multi Stream tpc/ip 7.1 Gbps ~900 MBps New speed Single Stream tpc/ip 6.5 Gbps ~800 MBps File Transfer Speed ~450 MBps 7,000 6,000 5,000 4,000 Jim: Internet2 is a consortium of universities building the Next Generation Internet. They defined a speed contest to recognize work on high-speed networking. Microsoft entered and won the first round of this contest, but we have not entered since. Indeed, Harvey’s team at Caltech has been setting the records over the last two years. Harvey: To set the records, including the one just certified by Internet2, we used the network shown on the previous slide and out-of-the-box tcp/ip to move data from CERN to Pasadena. Using Windows on Itanium2 servers with Intel and S2io 10 Gigabit Ethernet cards, and Cisco switches, we reached 6.25 gigabits per second – or just under 800 megabytes per second. mbps per second 3,000 2,000 1,000 2000 2001 2002 2003 2004 2005

6 World Wide Telescope Premise: Most Astronomy data is online
The Internet is the world’s best telescope It has data on every part of the sky In every measured spectral band: As deep as the best instruments It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). It’s a smart telescope: links objects and data with literature.

7 SkyServer.SDSS.org Built with Johns Hopkins U.
A modern archive Raw data in file servers Catalog data (derived objects) in Database 10 billon records, 2 TB Online query to any and all Also used for education 150 hours of online Astronomy Implicitly teaches data analysis Interesting things Based on Web Services Spatial data search Cloned by other surveys (a design template)

8 Service Oriented Architecture Data Federations of Web Services
Massive datasets live near their owners: Near instrument software pipeline, apps Near data knowledge and curation Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Uniform access to multiple Archives A common global schema Scientists get “personalized” extracts DB DB DB DB DB

9 Federation: SkyQuery.Net
Combines 15 archives Send query to portal, portal joins data from archives. Problem: want to do multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: “batch scheduler” on portal server, Deposits answer in personal database.

10 SkyQuery Structure Each SkyNode publishes Schema Web Service
Data Query Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service 2MASS INT SDSS FIRST SkyQuery Portal Image Cutout

11 Summary Microsoft Research is active inside and outside Microsoft.
10Gbps Networking is coming, x-64 is coming and we are investing to make them real. World Wide Telescope is coming Exemplifies service oriented architecture Built with web services and databases Has interesting spatial database algorithms Details on my website:

12


Download ppt "Jim Gray Researcher Microsoft Research"

Similar presentations


Ads by Google