Download presentation
Presentation is loading. Please wait.
Published byAvery Springs Modified over 9 years ago
1
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Data and Knowledge Grids Chaitan Baru Co-Director, Data and Knowledge Systems SDSC
2
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Introduction SDSC is leading-edge site of NPACI SDSC is one of the nodes in the TeraGrid SDSC, via NPACI thrust areas, works with a number of applications—Earth System Science, Neuroscience, Molecular Biology, Digital Sky, … SDSC works on a number of non-NPACI (including, industry) projects The DAKS program receives 80% of funding from non-NPACI sources The SDSC DAKS Program co-leads the data activities in Cal-(IT) 2 via the SDSC/Cal-(IT) 2 Data and Knowledge Engineering Lab
3
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Introduction The SDSC Data and Knowledge Systems (DAKS) program is unique in the nation. It supports: Computer Science R&D Applications-driven research Development of robust software systems Production data and visualization systems Involved in Grid-based computing… (very) High speed networking, fewer, high-performance nodes, “big”, possibly complex, data …also, Internet-based computing Web clients, Web databases and mediation, Web services, e.g. the Information Integration Testbed (I2T) Project Web-based grid computing
4
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Storage hardware Grid Storage (Curated Database) Filesystems, Database Systems Data Mining, Simulation Modeling, Analysis, Data Fusion Applications: Ecoinformatics, environmental science… Knowledge-Based Integration Advanced Query Processing Networked Storage (SAN) Visualization High speed networking Sensornets (real-time data, video streams) ROADNet ActiveCampus Monitoring Health of Civil Infrastructure DAKS Technology Layers
5
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Information Integration Testbed (NSF Digital Government/ITR grants) Sociology Workbench Oracle DBMS Java Servlets SOAP WSDL XML Metadata files Stats Server SOAP WSDL SOAP WSDL XML-based Mediator Clients “Parameterized” views Resource discovery Service discovery Mediation of geospatial information Accuracy, resolution issues UDDI XML XML queries
6
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Community Grid Projects GriPhyN—Grid Physics Network (NSF ITR) NVO—National Virtual Observatory (NSF ITR) BIRN—Biomedical Informatics Research Network (NCRR/NIH) GEON—GEOsciences Network
7
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure GriPhyN: The LIGO Project Use of COTS DBMS 1000 Channels Of data, every 2-3 seconds Filtering Store raw data and basic “products” Request for data Channels/Time (GB-TB) Data Analysis Result Request for “full sweep” of data (10’s-100’s TB) Recalibrate data
8
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Digital Sky Projects National Virtual Observatory (NVO) Digital images Image Analysis Sky Catalogs Load into DBMS Catalog A Catalog B Correlate across Catalogs Result Data mining
9
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure BIRN Integrating data from different brain mapping research sites UCSD, UCLA, Caltech, Duke, Mass General, Harvard Mouse and human brain BIRN Data/Knowledge Grid High-speed networking Access to distributed data Semantic mediation Intra-species and inter-species queries Visualization and analysis tools
10
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Example of BIRN Federation MRI Are there changes in axon diameter, and/or number, in the optic nerve of EAE animals, before the development of gross structural changes? Integrated View Mediator Electron microscopy Histology Web CaBP, Expasy Wrapper Integrated View Definition
11
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure BIRN Layered Architecture Network Layer Computational Grid Virtual Data Grid (SRB) Presentation/Visualization/Application Layer Data Integration Layer (Mediator) provides file and collection-level access to any data from any source allows exploration and manipulation of images and volumes allows query access to descriptive and computed information from multiple sources
12
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure GEON An outcome of the Geoinformatics community workshops GEON Geoscience Research Themes Earth's Surface: The Critical Interface Among Humans, Water, the Atmosphere, and Tectonics Biodiversity: Geoscience and Evolution Exploring the 4D Architecture of Continents GEON Information Technology Research GEON “Deep” Data Modeling and Semantic Mediation of 4D data sets 4D Visualization and Augmented Reality Data grids and distributed computing
13
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure GEON Participants Geosciences R. Arrowsmith, Arizona State University N. Christensen, University of Wisconsin M. Crawford, Bryn Mawr C. Duffy, Pennsylvania State University C. Flessa, University of Arizona A. Gary, University of Utah B. Huber, Smithsonian Institution R. Keller, University of Texas El Paso A. Levander, Rice University M. Liu, University of Missouri C. Marshall, Harvard University D. McLaughlin, Massachusetts Institute of Technology C. Meertens, UNAVCO D. McLaughlin, MIT C. Meertens, UNAVCO J. Oldow, University of Idaho D. Seber, Cornell University A.K. Sinha, Virginia Tech W. Snyder, Boise State University H. Staudigel, Scripps Institution of Oceanography H. Wang, University of Wisconsin Information Technology M. Bailey, San Diego Supercomputer Center C. Baru, San Diego Supercomputer Center B. Ludaescher, San Diego Supercomputer Center P. Papadopoulos, San Diego Supercomputer Center Y. Papakonstantinou, University of California San Diego T. Smith, University of California Santa Barbara Education and Outreach M. Marlino, Digital Library for Earth System Education (DLESE)
14
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure GEON Partners Government USGS NASA NOAA NGDC State Geologists Association Academia IRIS Cal-(IT) 2 Industry ESRI Oracle Sun Panoram
15
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Paleoenvironment “Locality” Where When Who Species A Species B Sequence Stratigraphy Geochemistry Lithology Selection Criteria Tectonic Setting Paleogeography Paleolatitude GEON Information Integration Example Biodiversity: The Paleobiology Database Charles Marshall, Harvard Synonymy Biological Attributes Museum holdings Phylogeny Minerology Body Mass International Timescale #1 International Timescale #2 When Where
16
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Complex Multiple-World Integration Scenarios Current database integration issues only address Structural/Schema Conflicts common semistructured data model (XML) schema transformations/integration (XML queries & transforms) Limited Query Capabilities capability based rewriting (e.g., TSIMMIS) These scenarios are “one-world” (e.g. electronic parts catalogs) or simple multiple world (e.g. “home buyer”) Problem: Semantic mediation in complex multiple worlds complex, disjoint, seemingly unrelated data “hidden semantics” in complex, indirect relationships
17
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Augmented Reality Facility (ARF) Simulation of database information overlaid on ground reality (Photograph of San Elijo Lagoon, San Diego County, CA)
18
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Scaling the “Network” Technology: hardware, software Disseminating “best practices” Keeping technologies and technological skills up to date
19
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure A Common Opportunity Creating the Data Institute Common distributed cyberinfrastructure for science communities Much commonality in IT problems across domains Support for training of scientists and data managers (“wetware”) Training in DBMS, GIS, Web, Wireless, Taxonomic DB, Metadata IT state-of-the-art moves quickly Dedicated, funded center to develop/modify existing technology Some requirements of science applications are not directly addressed by commercial technology “Riding the market” Leverage industry linkages and commercial technology
20
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure A Common Opportunity Creating the Data Institute Information clearinghouse/digital library Leverage what SDSC/Cal-(IT)2 is already doing Long-term preservation/sustenance of data and software tools Leverage SDSC’s work with the National Archives and Records Administration (NARA), Library of Congress (LoC), and California Digital Library (CDL) National Ecological Data Archive Create sustained community services E.g. Science UDDI (Universal Description, Discovery, and Integration)
21
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Storage / Networks / Computers Dataset Providers Collection Providers Tool Providers GEON Participants GEOSCIENCES Community Mediation Teams (View Providers) Mediation Teams (View Providers) Thematic Views Thematic Views Disciplinary Views Geophysics, Petrology, Tectonics, Geology, Paleontology,... Disciplinary Views Geophysics, Petrology, Tectonics, Geology, Paleontology,... SAN HPSS Knowledge-Based Integration / Semantic Mediation Domain maps, process maps Knowledge-Based Integration / Semantic Mediation Domain maps, process maps GEON Data Grid Services Authentication, distributed data management, persistent archives GEON Data Grid Services Authentication, distributed data management, persistent archives Services Visualization, Digital Library, Collaboration Services Visualization, Digital Library, Collaboration DLESE USGS ADEPT/ADL NSDL IRIS UNAVCO NGDC/NOAA NASA 4D Continental Architecture Biodiversity Earth's Surface GEON Interdisciplinary Themes Linux Clusters Virtual Collections Virtual Collections Virtual Collections GEON Discovery Center (Portal) GEON Collections Integrated Views GEON
22
Structural vs. Model-Based Mediation IF THEN Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models C2 C3 C1 R Classes, Relations, is-a, has-a,... DOMAIN MAP Raw Data XML Elements XML Models Integrated-DTD := XQuery(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,...
23
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Predicate MCAT HRM SDSC Storage Resource Broker & Metadata Catalog
24
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure NCSA 6+2 TF 4 TB Memory 240 TB disk SDSC 4.1 TF 2 TB Memory 225 TB SAN Caltech 0.5 TF.4 TB Memory 86 TB disk ANL 1 TF.25 TB Memory 25 TB disk 32 5 5 TeraGrid: 13.6 TF, 6.8 TB memory, 79 TB internal disk, 576 network disk HPSS 300 TB ESnet HSCC MREN/Abilene Starlight 32 24 8 32 24 8 4 Juniper M160 OC-12 OC-48 OC-12 574p IA-32 Chiba City 128p Origin HR Display & VR Facilities 256p HP X-Class 128p HP V2500 92p IA-32 Myrinet Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) OC-12 OC-3 vBNS Abilene MREN 1176p IBM SP 1.7 TFLOPs Blue Horizon OC-48 NTON 4 4 2 x Sun E10K 4 15xxp Origin UniTree 1024p IA-32 320p IA-64 2 14 8 vBNS Abilene Calren ESnet OC-12 OC-3 Sun Server GbE 24 Extreme Blk Diamond OC-12 ATM Calren 16 10
25
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure HPSS 300 TB NCSA 8 TF 4 TB Memory 240 TB disk SDSC 4.1 TFLOP 2 TB Memory ~25 TB internal disk ~225 TB network disk Caltech 0.5 TF 0.4 TB Memory 86 TB disk Argonne 1 TF 0.25 TB Memory 25 TB disk Myrinet Clos Spine TeraGrid Backbone (40 Gbps) Blue Horizon IBM SP 1.7 TFLOPs 2 x Sun E10K vBNS Abilene Calren ESnet Sun SDSC “node” configured to be best site for data-oriented computing in the world
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.