Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy

Similar presentations


Presentation on theme: "Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy"— Presentation transcript:

1 Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu Christopher Stubbs Professor Department of Physics Department of Astronomy cstubbs@fas.harvard.edu

2 2 Storing, analyzing, and exploiting large data sets Searching for dark matter and dark energy Searching for new elementaryparticlesSearching elementaryparticles Detailed imaging of brain function

3 3 Some common threads Ambitious instruments copious dataAmbitious instruments copious data E.g. tens of TB per night from imminent astronomy surveys Loosely coupled computingLoosely coupled computing Don’t need linked analysis that uses all images Diverse applications from common dataDiverse applications from common data Simulations are an integral aspectSimulations are an integral aspect Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere International collaborationsInternational collaborations Computer science aspectsComputer science aspects World’s largest non-proprietary databases Clustering, data mining, file system optimization… Ambitious instruments copious dataAmbitious instruments copious data E.g. tens of TB per night from imminent astronomy surveys Loosely coupled computingLoosely coupled computing Don’t need linked analysis that uses all images Diverse applications from common dataDiverse applications from common data Simulations are an integral aspectSimulations are an integral aspect Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere International collaborationsInternational collaborations Computer science aspectsComputer science aspects World’s largest non-proprietary databases Clustering, data mining, file system optimization…

4 4

5 5 27 km CERN, outside Geneva

6 6 Seriously Big Toys. Harvard involvement in ATLAS detector: J. DaCosta and G. Brandenberg at CERN now, in shakedown Built muon chambers here J. Huth plays leadership role in scientific computing for LHC Harvard involvement in ATLAS detector: J. DaCosta and G. Brandenberg at CERN now, in shakedown Built muon chambers here J. Huth plays leadership role in scientific computing for LHC

7 Event Simulations >30 Million event simulations are typical Pick an interaction Propagate through model of the detector Measure detection efficiencies >30 Million event simulations are typical Pick an interaction Propagate through model of the detector Measure detection efficiencies

8 On-the-fly event reconstruction Find tracks and trigger/store if interesting Find tracks and trigger/store if interesting Precise track determination Aggregate event statistics

9 ATLAS computing 5 million lines of code 5 million lines of code 200 developers, worldwide 200 developers, worldwide 200 collision events per second 200 collision events per second Automated event selection in firmware Automated event selection in firmware Selected subset of events to disk Selected subset of events to disk These selected events distributed worldwide to a hierarchy of data centers. These selected events distributed worldwide to a hierarchy of data centers. 5 million lines of code 5 million lines of code 200 developers, worldwide 200 developers, worldwide 200 collision events per second 200 collision events per second Automated event selection in firmware Automated event selection in firmware Selected subset of events to disk Selected subset of events to disk These selected events distributed worldwide to a hierarchy of data centers. These selected events distributed worldwide to a hierarchy of data centers.

10 Sky Surveys in Astronomy Optical:PanSTARRS 1.4 Gpix, 1.8m Optical:PanSTARRS Radio: Mileura Wide-Field Array 1 km array of 8000 custom antennas 128 gigabit/s computing challenge Radio: Mileura Wide-Field Array 1 km array of 8000 custom antennas 128 gigabit/s computing challenge

11 11 Close, Far, Recent Ancient Expansion history can be mapped by measuring both distances and redshifts Our View of the Expanding Universe Expansion causes stretching of light, “redshift”

12 12 (Hubble Space Telescope, NASA) Supernovae are powerful cosmological probes Distances to ~6% from brightness Redshifts from features in spectra

13 13 Schmidt et al, High-z SN Team

14 14 Near Earth Asteroids Inventory of solar system is incompleteInventory of solar system is incomplete R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers R=300m asteroids in ocean wipe out a coastlineR=300m asteroids in ocean wipe out a coastline Demanding project: requires mapping the sky down to 24 th every few days, individual exposures not to exceed ~20 sec.Demanding project: requires mapping the sky down to 24 th every few days, individual exposures not to exceed ~20 sec. PanSTARRS will detect NEAs to ~400mPanSTARRS will detect NEAs to ~400m Inventory of solar system is incompleteInventory of solar system is incomplete R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers R=300m asteroids in ocean wipe out a coastlineR=300m asteroids in ocean wipe out a coastline Demanding project: requires mapping the sky down to 24 th every few days, individual exposures not to exceed ~20 sec.Demanding project: requires mapping the sky down to 24 th every few days, individual exposures not to exceed ~20 sec. PanSTARRS will detect NEAs to ~400mPanSTARRS will detect NEAs to ~400m

15 Cosmic Cinematography: Challenges The “static” sky: optimal co-adding of images, database issues The transient sky: variability classification asteroid association and orbits light curve analysis light curve analysis fusion with other data sets The “static” sky: optimal co-adding of images, database issues The transient sky: variability classification asteroid association and orbits light curve analysis light curve analysis fusion with other data sets

16 16 A New Approach to Radio Astronomy Hardware

17 17 A Brief History of the Universe culmination of structure formation first luminous structures turning point after the Dark Ages Era of Reionization ionized neutral ( H ) ionized z~6.2 “The Gap”

18 18 BOOLARDY

19 19 Lincoln Greenhill (CfA)- MWA project

20 20 IIC affords us the opportunity to share resources, tools and know-how Shared hardware maximizes effectivenessShared hardware maximizes effectiveness Shared archival data storage, cooperativelyShared archival data storage, cooperatively Reap benefits of sophisticated system administrators and database professionalsReap benefits of sophisticated system administrators and database professionals  People are quantized, unaffordable for single group Learn from each other on technical topics of common interestLearn from each other on technical topics of common interest  Often large discrepancies across subfields, IIC raises all boats. Shared hardware maximizes effectivenessShared hardware maximizes effectiveness Shared archival data storage, cooperativelyShared archival data storage, cooperatively Reap benefits of sophisticated system administrators and database professionalsReap benefits of sophisticated system administrators and database professionals  People are quantized, unaffordable for single group Learn from each other on technical topics of common interestLearn from each other on technical topics of common interest  Often large discrepancies across subfields, IIC raises all boats.

21 8K x 8K pixel array 16 independent amplifiers Each is a 1024 x 2048 subimage 8K x 8K pixel array 16 independent amplifiers Each is a 1024 x 2048 subimage

22 22


Download ppt "Data-Intensive Science: Addressing common needs with shared tools Christopher Stubbs Professor Department of Physics Department of Astronomy"

Similar presentations


Ads by Google