Presentation is loading. Please wait.

Presentation is loading. Please wait.

天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing.

Similar presentations


Presentation on theme: "天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing."— Presentation transcript:

1 天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing

2 Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function based on Boundary Growing Model –Cross-match in distributed environment based on MapReduce model Plan & Discussion

3 Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion

4 Our Previous Function PHIXmatch —— Paralleled Healpix-Indexing Xmatch Test Dataset : SDSS(100million) ×2MASS ( 470million ) Function: Spatial Join Results: SDSS_IDTwomass_IDDistance 58773151261727136402595905+00002005.243e-05 58773151261727136502595905+00002006.55e-05 58773151315407682802593768+00122193.2e-05 58773151315407726902593768+00122190.0025043169

5 HEALPix Index Function HEALPix —— Hierarchical Equal Area isoLatitude Pixelization of a sphere. Quadtree pixel numbering

6 What we have resolved Resolve the border-block problem A fast bitwise operation algorithms to deduce the neighbor blocks’ index number Realize parallel cross-match computation in multi-core environment

7 Results & Performance Analysis FunctionTable AData Amount of A Table BData Amounts of B TimeFinish amounts /sec PHIXmatch function SDSS100,106,8112MASS47099297025min52,139 GaoDan’s Function Part of GSC2.3 295,832Part of GSC2.3 295,8325.6min880 Results Conclusion Has marked performance superiority comparing with previous functions and is applicable to large-scale cross-match on multi-core system Paper: Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui, Liqiang Lv, and Jian Xiao, A Paralleled Large-Scale Astronomical Cross-Matching Function, International Conference on Algorithms and Architectures for Parallel Processing (ica3pp) 2009, LNCS5574: p604~614

8 Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion

9 Adaptability Research on HTM-Indexed Data HTM—Hierarchical Triangular Mesh Resolve the border-data problem in HTM

10 Results of HTM version Xmatch 42min Why the results is poor compared with HEALPix version? Answer: the triangle-shape!

11 Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion

12 New function based on Boundary Growing Model Database reading operation is too time-consuming, especially for the border data!

13 Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion

14 MapReduce A software framework introduced by Google to support distributed computing on large data sets on clusters of computers. –Huge datasets –Distributable application –Data stored either in a filesystem (unstructured) or within a database (structured) Map step & Reduce step –Map: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node. –Reduce: The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.

15 Map Step & Reduce Step Input Map Reduce Result Shuffle/SortChop/replicate

16 Apache Hadoop A Java software framework inspired by Google’s MapReduce and Google File System papers. What function does it perform? Easy programming, auto scheduling, error detection & correction, Who use Hadoop? –Yahoo! – web search; advertising businesses –Amazon – S3, EC2 –IBM & Google – computation plat for Universities –Institute of Computing Technology, Chinese Academy of Sciences -- PBminer Page links: 1 T output: over 300 TB, compressed! Number of cores in a job: over 10,000 disk in the cluster: over 5 P

17 Hadoop Architecture

18 Why using MapReduce to Xmatch Near-linear speedup, comparing with MPI cluster Suitable for data-intensive, compute- intensive application, low-cost! Have been used in many Data Mining application, maybe useful for more complex cross-match functions.

19 Plan & Discussion Service for larger data sets (TB) and various catalogs such as… –Interfaces for more kinds of catalogs –Additional measures to deal with TB-level data –Parallelizing other cross-match functions

20 天文信息技术联合实验室 Thank you! We need your help!


Download ppt "天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing."

Similar presentations


Ads by Google