Presentation is loading. Please wait.

Presentation is loading. Please wait.

Опыт использования нечетких распределенных вычислений (cloud computing) в геоинформатике М.Н. Жижин Геофизический центр и Институт космических исследований.

Similar presentations


Presentation on theme: "Опыт использования нечетких распределенных вычислений (cloud computing) в геоинформатике М.Н. Жижин Геофизический центр и Институт космических исследований."— Presentation transcript:

1 Опыт использования нечетких распределенных вычислений (cloud computing) в геоинформатике М.Н. Жижин Геофизический центр и Институт космических исследований РАН

2 New technologies and innovations Long-term preservation with metadata and lineage (Virtual Observatories) Parallel/disrtibuted data storage with Interactive data query and network transfer of large datasets (MapReduce) Relational -> Object -> XML -> Array databases (SciDB) HPC data processing and modeling algorithms (Grid) Event detection, interrelation and data mining (AlphaSearch) Web technologies for visualization of different data types with geolocation (Neogeography) Collaborative data visualization (Videowalls) Scalable virtualization of CPU/network/storage resources (Cloud Computing)

3 Multiplets of regional earthquakes

4 Downhole multipoint measurement at Soultz geothermal reservoir

5 Global Lambda Integrated Facility Available Advanced Network Resources GLIF is a consortium of institutions, organizations, consortia and country National Research & Education Networks who voluntarily share optical networking resources and expertise to develop the Global LambdaGrid for the advancement of scientific collaboration and discovery. Visualization courtesy of Bob Patterson, NCSA; data compilation by Maxine Brown, UIC. www.glif.iswww.glif.is Source: Joe Mambrotti

6 GLORIAD: 10Gb Worldwide Ring Source: Natalia Bulashova

7 USA-Russia Lightpath for Fast Data Transfer of Terabyte-sized Scientific Datasets National Center for Data Mining (NCDM) at the University of Illinois at Chicago, Geophysical Center RAS and Space Research Institute RAS have successfully moved 1.4 TB of data in 4.5 hours over a 1 Gbps lightpath between Chicago and Moscow as part of the Teraflow Network initiative Using NCDM’s open-source UDP-based Data Transfer protocol (UDT), we were able to transfer the MS SQL database with SDSS astronomy catalog. The 2.5 TB database dump was compressed to 1.4 TB, split into 60 files, transferred over a 1 Gbps lightpath and then decompressed in Moscow and loaded back to MS SQL Server The SkyServer portal and the SDSS database were developed by Jim Gray at MSR and Alex Szalay at JHU. Russian language mirror now resides at www.skyserver.ru in Moscow Direct Lightpath link from IKI in Moscow to NGDC NOAA in Boulder has been successfully tested

8 Russian Skyserver mirror: www.skyserver.ru

9 Past Observations + Predictive Model = Reanalysis 1.Direct observations in the past – including raw and processed data, e.g. meteorological station or satellite, 10 5 observations of atmosphere each 6 h 2.Predictive numerical model – “knows” physics, uses direct observations as boundary values, e.g. Global Circulation Model, 360 lat X 180 lon X 20 levels X 100 parameters= 1.3 X 10 8 data values each 6 hours 3.Reanalysis – accumulated output of the numerical model forecasts each corrected for the available direct observations for a long time period, 50 years at 6 h time step

10 Why OGSA-DAI service container? Standard tool in the Grid community Supports distributed workflow (in version 3.*) Built in support for asynchronous transactions Compatible with Web (Axis) and Grid (OMII, UNICORE, GT4) Looked at alternatives like OpenDap, WCS, … –documentation of our analysis is available Problem 1: it is very complex – Solution: REST wrapper Problem 2: supports only File, SQL and XML data types and queries – Solution: implement additional data sources and functions for data in multidimensional arrays

11 Web technologies for visualization of different data types with geolocation KML & geoRSS Web-services for CDM data sources OGC Web Map Services WMS/WFS/WCS MS Virtual Earth Google Maps

12 12 Terraserver tile server by Jim Gray in 1998 http://terraserver.microsoft.com http://terraserver.microsoft.com Large database on the Web (3 TB) Operational since June 1998 Public access to USGS topo maps and aerial images Low resolution images No global coverage GPS market not ready

13 Core box set image pre-processing At the core warehouse images are acquired for the whole box set To visualize them we split them into separate samples Original box sets Processed

14 New ways to mashup raster data

15 Above the Clouds: A Berkeley View of Cloud Computing Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as Software as a Service (SaaS). The datacenter hardware and software is what we call a Cloud. When a Cloud is made available in a pay-as-you-go manner to the public, we call it a Public Cloud; the service being sold is Utility Computing: AmazonWeb Services, Google AppEngine, and Microsoft Azure. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html

16 Amazon AWSMicrosoft AzureGoogle AppEngine VMx86 32 and 64 bit architecture via Xen VM Computation elasticity allows scalability, but developer must build the machinery, or third party must provide it Microsoft Common Language Runtime (CLR) VM; Automatic load balancing Predefined application in Python Persistent state stored in MegaStore Automatic scaling StorageRange of models from block store (EBS) to augmented key/blob store (SimpleDB) Scaling varies from no scaling (EBS) to fully automatic (SimpleDB, S3) APIs vary from standardized (EBS) to proprietary (S3) SQL Data Services (restricted view of SQL Server) Azure storage service MegaStore/BigTable NetworkDeclarative specification of topology Security Groups Availability zones Elastic IP addresses provide persistent name Automatic based on roles Fixed topology for 3-tire webapps Automatic scaling

17 How to deploy SPIDR in Cloud? Single instance: S3 EBS EC2 SPIDR webapp & web services MySQL databases Database dump File system snapshot VM snapshot bundle VM image

18 Can we support multiple SPIDRs? In different Amazon cloud regions? Yes! Launch several instances of the SPIDR VM Configure DNS round-robin for load balancing Run MySQL master on the first instance, and MySQL slaves on others or Use third-party high-availability products for Amazon cloud, such as RightScale

19 Clouds above Grid: Cumulus Nimbus experiment in SKIF-Grid, fall 2009

20 Cloud VMs managed as Grid jobs

21 Condor Grid deployed in Cloud


Download ppt "Опыт использования нечетких распределенных вычислений (cloud computing) в геоинформатике М.Н. Жижин Геофизический центр и Институт космических исследований."

Similar presentations


Ads by Google