Presentation is loading. Please wait.

Presentation is loading. Please wait.

Astronomical Data Processing & Workflow Scheduling in cloud

Similar presentations


Presentation on theme: "Astronomical Data Processing & Workflow Scheduling in cloud"— Presentation transcript:

1 Astronomical Data Processing & Workflow Scheduling in cloud
China-VO Astronomical Data Processing & Workflow Scheduling in cloud -- Big-Data Oriented Research Zhao Qing Tianjin University of Science & Technology (天津科技大学) China-VO Shanghai, the 17th of May, 2017 Hello, every one, I am a teacher from ...

2 Team Introduction Tianjin University of Science & Technology
A young group of the China-VO family Today I would like to share with you the work we are researching. I am very glad to get your advice, and that would be very important to us.

3 Contents Big-data oriented astronomical data processing based on Hadoop & Spark Footprint Generation Cross-match Task scheduling strategies of astronomical workflows in cloud This is the abstract. One of our research is using high-performance computing technologies such as hadoop and spark to resolve astronomical applications, espetially the data-intensive applications. The first one is footprint generation and the second one is cross-match the other research is the task scheduling strategies of astronomical workflows in cloud. since scientific workflow is one of the most common application model in astronomy.

4 1. Astronomical data processing based on Hadoop & Spark
Footprint Generation Sky coverage - an important piece of information about astronomical observations. Applications: intersections unions other logical operations based on the geometric coverage of regions of the sky cross-match Multi-order coverage Healpix maps generated on Hadoop & Spark platform Sky coverage is one of the most important pieces of information about astronomical observations. it is very useful for many purposes such as establishing intersections, unions, and other logical operations based on the geometric coverage of regions of the sky. And it is also useful for crossmatch. The footprint is represented as a multi-order coverage Healpix maps. Since the catalogs are big, the paralleled methods based on Hadoop and Spark have been developed and tested.

5 Footprint generation based on Spark
This is the basic flow of spark based footprint generation. There are some iteration operations, so the experimental results on Spark is better than Hadoop. And larger-scale experiment will be coming soon. Data: Twomass,12.6G, records Environment: Dual-core with 4G memory Spark-2.0.2, Hadoop-2.7.3 node number 4 8 time (s) 138s 69s

6 Hadoop based cross-match
And this is the astronomical cross-match based on Hadoop. Step1: data distribution (1 Map+ 1 Reduce) Step2: distance calculation(1 Map)

7 Experimental results Data: SDSS, 100,106,811 records
This is the experimental results, we use 4, 8, 16, 32, 64 PCs to do these experiments, and got a good speedup effect. Data: SDSS, 100,106,811 records 2MASS, 470,992,970 records node number 4 8 16 32 64 time (s) 273 136 69 38 25

8 Spark based cross-match
integrated with footprint - generation typical rich-BoT workflows further optimization scientific workflow scheduling research At present, we only use some programming skills of spark to improve its performance it can be modeled as a rich bag of task workflow. since there are many batches of independant tasks. in the future we will also applicate the result of our task scheduling research to further optimize its performance

9 2. Task scheduling strategies of astronomical workflows in cloud
China-VO and Alibaba-Cloud The cloud China-VO will provide to users: data software computing resources Science workflow one of the most commonly used application model in Astronomy d4 d7 d1 t2 t4 t5 d3 d2 t1 d6 one of the big event in this year for china-vo is the coorporation with Alibaba-Cloud. The cloud China-VO will not only provide data, software, but also computing resources for varied astronomical applications. science workflow is one of the most commonly used application model in Astronomy, so its task scheduling research in cloud is valuable. d8 d6 d3 t3 d5

10 What is worthy of concern running efficiency rental cost
energy consumption How to achieve these goals? data placement resource allocation task sheduling High Performance low cost low energy consumption t1 t2 t3 t4 t5 d3 d1 d2 d4 d7 d8 d5 d6 upload data launch application There are multiple optimization objectives, But how to achieve them? we research the strategies of data placement, resource allocation and task scheduling in cloud.

11 Characteristics of astronomical workflows applications
2. Cloud environment modeling and the heuristic rule based task scheduling method 1 . task and data clustering based on data correlation Characteristics of astronomical workflows applications Data-intensive & compute-intensive Rich-BoT structures Task execution time difficult to estimate complex network structure heterogeneous machines Our contributions 4. multi-objective optimization 3 . Dynamic multi-layer deadline decomposition

12 Main publications A new energy-aware task scheduling method for data-intensive applications in the cloud, Journal of Network and Computer Applications,2016,59:14-27。 (SCI: WOS: ) A Data Placement Algorithm for Data Intensive Applications in Cloud,International Journal of Grid and Distribution Computing,2016,9(2): 。(EI: ) A data placement strategy for data-intensive scientific workflows in cloud,15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015, , 。(EI: ) Heuristic Data Placement for Data-Intensive Applications in Heterogeneous Cloud, Journal of Electrical & Computer Engineering, 2016, 2016(13):1-8 ( EI) Qing Zhao ,Haonan Dai,Congcong Xiong ,Peng Wang,Heuristic Data Layout for Heterogeneous Cloud Data Centers,2015 International Symposium on Information Technology Convergence, Qing Zhao, Jizhou Sun,Ce Yu,Jian Xiao,Chenzhou Cui, Xiao Zhang, Improved parallel processing function for high-performance large-scale astronomical cross-matching, Transactions of Tianjin University,2011,17(1):62-67。(EI: ) This is the main pulications of us

13 Qing Zhao, Congcong Xiong, An Improved Data Layout Algorithm Based on Data Correlation Clustering in Cloud, 2014 International Symposium on Information Technology Convergence, 2014 Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui,Liqiang Lv,Jian Xiao,A paralleled large- scale astronomical cross-matching function,9th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2009, , 。(EI: ) Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui, Jian Xiao,Big data oriented paralleled astronomical cross-match, Journal of Computer Application,2010,30(8): Qing Zhao, Jizhou Sun,Jian Xiao, Ce Yu,Chenzhou Cui, Xu Liu, Ao Yuan, Distributed astronomical cross-match based on MapReduce, Journal of Computer application research, 2010,27(9): Thank you! We need your advice, so we can get further understanding about what is the most needed for VO, and what we can do better for VO in the future.


Download ppt "Astronomical Data Processing & Workflow Scheduling in cloud"

Similar presentations


Ads by Google