Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tieniu TAN Deputy Secretary-General Chinese Academy of Sciences (CAS) 29 Mar. 2010, Irvine, USA The 4th China-US Roundtable on Scientific Data Cooperation.

Similar presentations


Presentation on theme: "Tieniu TAN Deputy Secretary-General Chinese Academy of Sciences (CAS) 29 Mar. 2010, Irvine, USA The 4th China-US Roundtable on Scientific Data Cooperation."— Presentation transcript:

1 Tieniu TAN Deputy Secretary-General Chinese Academy of Sciences (CAS) 29 Mar. 2010, Irvine, USA The 4th China-US Roundtable on Scientific Data Cooperation Advanced Cyber-infrastructure for Scientific Data Applicationsin CAS The 4th China-US Roundtable on Scientific Data Cooperation Advanced Cyber-infrastructure for Scientific Data Applications in CAS

2 Outline  Background  Advanced Cyber-Infrastructure in CAS  Typical Data Intensive e-Science Applications in CAS  Conclusion

3 Scientific Data Deluge  Scientists face a data deluge –Vast volume of scientific data captured by large scientific facilities, ubiquitous sensors, new instruments and computer models  Science and engineering research have become increasingly data- intensive –New scientific opportunities are emerging from increasingly effective data organization, access and usage (NSF, 2007)

4 Data-intensive scientific discovery: e-Science  The fourth paradigm: data-intensive scientific discovery (Microsoft, 2009) –A Transformed Scientific Method  e-Science is synthesis of information technology and science, giving priority to scientific data lifecycle and data exploration (Jim Gray) –data captured by instruments or generated by simulator; processed by software; information/knowledge stored in computer; scientist analyzes database / files; using data management and statistics

5 China National Scientific Data Sharing Initiatives  Ministry of Science and Technology (MOST) started the implementation of Scientific Data Sharing Program (SDSP) in 2002 –Supporting almost 20 projects to promote scientific data sharing  National Science & Technology Infrastructure (NSTI) was launched in 2005 by MOST and Ministry of Finance ( Http://www.escience.gov.cn ) –Supporting 38 projects for promoting Science and Technology Resources, data and information sharing and Open Access –Total funding ~2 billion RMB

6 High Speed Network -CSTNET -CSTNET-CNGI -GLORIAD 1.Field observation stations 2.Large scientific facilities 3.others Advanced CI for Data Lifecycle in CAS Application Generation &Collection Trans- mission Computing &Analysis Storage &Curation Data Information Stream Data Centers -storage &preservation -Curation -Sharing and Service Supercomputing Grid -Computing -Analysis -Mining -visualization Data intensive e- Science Applications

7 Data generation  Large scientific facilities produce huge data –+20 in operation –+20 under construction  Long-term field observation stations –+100 stations covering Ecology, Environment, Space, etc.  Other research data, including experiments, modeling, computing, etc. –100 institutes, more than 50000 researchers in CAS

8 Network Field Observation  Network expanded to link field observations –Real Time Data Collection  CERN  China Ecology system Research Network  Disaster and Environment Observation  Astronomy and space observation

9 Meridian Space Weather Monitoring Program  More than 10TB data will be generated and transmitted to Beijing per year  data analysis needs 20Tflops  A data system and processing infrastructure being built

10 Cosmic-ray observatory: ARGO/AS   Cosmic-ray observatory at Yangbajing in Tibet: –ARGO: China-Italy –AS  : China-Japan  ~200TB raw data per year.  Data transferred from YBJ- ARGO and processed at IHEP and INFN  Rec. data accessible by collaborators.

11 BEPCII / BESIII BEPC: Beijing Electron-Positron Collider –upgrade: BEPCII/BESIII, operational in 2008 –2.0 ~ 4.6 GeV/C –(3~10)×10 32 cm -2 s -1 –36 Institutions from China, US, Germany, Russian, and Japan –4000+ KSI2K for data process and physics analysis –5+ PB in five years

12 Data Transmission-High Speed Network  China Science and Technology Network ( CSTNet )  Non-profitable, academic and research networks in China to support advanced science applications and research on next generation Internet  Connect some 200 institutes, and 1,000,000 end users

13 Lanzhou Xinjiang Xian Shenyang Changchun Chengdu Kunming Wuhan Guangzhou Shanghai Hefei Lasa Qingdao Haerbin Xining Dalian Guiyang Yangbajing Xishuangbanna Changsha TianJin 2.5Gb/s 155Mb/s < 155Mb/s Figure HongKong 1Gb/s Taiwan Shenzhen Fuzhou Ningbo Nanjing Shanxi Shijiazhuang Beijing CSTNET Backbone

14 Interconnecting with other Networks Russia Netherland USA KISTI Korea NICT Japan AS Hongkong GOOGLE Hongkong HKIX Hongkong CUHK Hongkong China169 China Unicom ChinaNet TELECOM CERNET HKOEPCSTNET Gloriad 10G 2.5G 1G 2.5G 2G 155M 700M BJ NAP 2.5G Hongkong 2G Internet Beijing

15 上海 Jiling 辽宁 Guangzhou 兰州 XinJiang Beijing 10Gbps International Link10G 羊八井 100+ Institutes 40+ Field stations and big science facilities Computing facilities and storage facilities CSTNET-CNGI An IPv6 Network for Science based on CSTNET will start to build this year Chengdu XI’AN Kunmin g WuHan Hefei Nanjing

16 Data Storage and Curation  A General Scientific Data Center –Common data infrastructure construction, operation –Data archive and preservation  Some domain specific scientific data centers –Discipline data curation and sharing service  A CAS scientific data app project –Multi-discipline data sharing and applications  A series of domain-based scientific data sharing systems and institute level data sharing infrastructure

17 Data Resource Center  A General Scientific Data Center  A new organization responsible for data preservation, curation and access service in CAS Mass data backup Data online service Mass data analysis and process Long-term preservation of important data Data Resource Center Technology service Network storage space system environment Application service mass data Managemen t system collaborator staff

18 Massive Storage System in Data Resource Center  Massive Storage System –Scientific data archive system (5PB tape) –Online data storage system (1PB disk array)  Internet-based service (Cloud Service) –Data backup –Archiving and curation –on-line data access and analysis

19 Domain Specific Scientific Data Centers  World Data Center ( World Data System ) in CAS –Natural Resource Environment Data Center –Astronomy Data Center –Space Data Center –Geophysics Data Center –Glacier and Frozen Earth Data Center

20 Scientific Databases (SDB)  A Long-term mission started in 1986 which was funded by CAS –data from research, for research  Collecting multi-discipline research data and promoting data sharing –More than 350 research databases and 400 datasets by 61 institutes –Over 60TB data available to open access and download http://www.csdb.cn

21 Scientific Databases (cont.)  8 Resource databases –Geo-Science –Biodiversity –Chemistry –Astronomy –Space Science –Micro biology and virus –Material science –Environment  2 Reference databases –China Species –compound  4 Application-Oriented databases –High Energy (ITER) –Western Environment Research –Ecology research –Qinghai Lake Research

22 Scientific Data Grid Scientific Data and databases Scientific Data Grid Middleware Scientific Data Grid Applications Bioscience GatewayGeosciences Gateway Chemistry Gateway Other Gateways CAS Scientific Data Grid  Integrating distributed scientific data into a com- prehensive service and application environment  Linking all data canters as a data net

23 Scientific Computing Grid Scientific Computing Grid Access Through network Local/Remote User Resource Abstracting Cooperation Resource Interconnection Other network resource and environment Database, e-Science, ARP, website, science, TRP CNGRID & environment Super Computing Grid Application service and Technical supporting System, Uniform System operating, Supporting & Service. Uniform Regulations SCCAS, 120+Tflops Computing capacity 8+ Branches : 50 Tflops common Computing capacity Institute Computing Resource 50 Tflops common Computing capacity Lenovo 7000, Peak: 143TeraFLOPS

24 Scientific Computing Grid HPC, Cluster, Workstation, Storage Windows / Linux Clients Web Portal Grid Middleware

25 HEP Grid in China  Access to the LHC data for scientific research: A grid computing system is built in CAS  WLCG MoU signed with CERN in 2006 to build a Tier-2 center at IHEP for both the ATLAS and CMS experiments. IHEP PKU SDU USTC NJU

26 Tier-2 site at IHEP  WLCG site based on EGEE/gLite  Associated with CC-IN2P3 in Lyon  Work nodes with 1600 cores  400 TB disk space

27 Typical data intensive e-Science Applications  Developing a series of pilot e-Science applications –Most are data intensive

28 Pt>20 GeV/c Tracks ttH(2l2b4j2  ) full simulation event display ttH-2L selection ttbar mimic to ttHWW HEP Grid Applications: ATLAS MC Study

29 Rosetta Early/Late Stage HEP Grid application: protein prediction  Explore the non-natural protein sequence space  Set up a massive protein structure prediction environment  Develop web tools for the biology community  Result of EUChinaGrid project (EU FP6 project) KWCWPFASHNDLKVQSQ WYVEPPDTIPPYNKYGTN FIKHCQYIAHMQGDTHFF NRVRMHQLWKIIVDCAY

30 ChinaFLUX Built in 2002 for climate change and environment research

31 31 Data System Observation system Modeling and visualization Data transmission ChinaFLUX e-Science Environment

32 Real data from sensors to field stations, then to institutes, finally to data centers to process and share Cyberinfrastructure for data collection

33 Data intensive application environment  Data synthesis and integration  Data analysis and modeling  visualization

34 OPEN SCIENCE CLOUD IaaS Network Service Computing Service Storage Service … IaaS Network Service Computing Service Storage Service … Conclusion Paas Data intensive application environment … Paas Data intensive application environment … Saas Software and tools for data curation, analysis, mining and visualization … Saas Software and tools for data curation, analysis, mining and visualization … Building an Open Science Cloud serving not only CAS researchers, but also the wider scientific community! DaaS Scientific data and databases Service DaaS Scientific data and databases Service

35 Thank you !


Download ppt "Tieniu TAN Deputy Secretary-General Chinese Academy of Sciences (CAS) 29 Mar. 2010, Irvine, USA The 4th China-US Roundtable on Scientific Data Cooperation."

Similar presentations


Ads by Google