Download presentation
Presentation is loading. Please wait.
Published byLeonard Lawrence Modified over 9 years ago
1
天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing
2
Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function based on Boundary Growing Model –Cross-match in distributed environment based on MapReduce model Plan & Discussion
3
Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion
4
Our Previous Function PHIXmatch —— Paralleled Healpix-Indexing Xmatch Test Dataset : SDSS(100million) ×2MASS ( 470million ) Function: Spatial Join Results: SDSS_IDTwomass_IDDistance 58773151261727136402595905+00002005.243e-05 58773151261727136502595905+00002006.55e-05 58773151315407682802593768+00122193.2e-05 58773151315407726902593768+00122190.0025043169
5
HEALPix Index Function HEALPix —— Hierarchical Equal Area isoLatitude Pixelization of a sphere. Quadtree pixel numbering
6
What we have resolved Resolve the border-block problem A fast bitwise operation algorithms to deduce the neighbor blocks’ index number Realize parallel cross-match computation in multi-core environment
7
Results & Performance Analysis FunctionTable AData Amount of A Table BData Amounts of B TimeFinish amounts /sec PHIXmatch function SDSS100,106,8112MASS47099297025min52,139 GaoDan’s Function Part of GSC2.3 295,832Part of GSC2.3 295,8325.6min880 Results Conclusion Has marked performance superiority comparing with previous functions and is applicable to large-scale cross-match on multi-core system Paper: Qing Zhao, Jizhou Sun, Ce Yu, Chenzhou Cui, Liqiang Lv, and Jian Xiao, A Paralleled Large-Scale Astronomical Cross-Matching Function, International Conference on Algorithms and Architectures for Parallel Processing (ica3pp) 2009, LNCS5574: p604~614
8
Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion
9
Adaptability Research on HTM-Indexed Data HTM—Hierarchical Triangular Mesh Resolve the border-data problem in HTM
10
Results of HTM version Xmatch 42min Why the results is poor compared with HEALPix version? Answer: the triangle-shape!
11
Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion
12
New function based on Boundary Growing Model Database reading operation is too time-consuming, especially for the border data!
13
Contents Our Previous Function New Improvements and Attempts –Discussion of Adaptability on HTM-Indexed Data –New function Plan & Discussion
14
MapReduce A software framework introduced by Google to support distributed computing on large data sets on clusters of computers. –Huge datasets –Distributable application –Data stored either in a filesystem (unstructured) or within a database (structured) Map step & Reduce step –Map: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node. –Reduce: The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.
15
Map Step & Reduce Step Input Map Reduce Result Shuffle/SortChop/replicate
16
Apache Hadoop A Java software framework inspired by Google’s MapReduce and Google File System papers. What function does it perform? Easy programming, auto scheduling, error detection & correction, Who use Hadoop? –Yahoo! – web search; advertising businesses –Amazon – S3, EC2 –IBM & Google – computation plat for Universities –Institute of Computing Technology, Chinese Academy of Sciences -- PBminer Page links: 1 T output: over 300 TB, compressed! Number of cores in a job: over 10,000 disk in the cluster: over 5 P
17
Hadoop Architecture
18
Why using MapReduce to Xmatch Near-linear speedup, comparing with MPI cluster Suitable for data-intensive, compute- intensive application, low-cost! Have been used in many Data Mining application, maybe useful for more complex cross-match functions.
19
Plan & Discussion Service for larger data sets (TB) and various catalogs such as… –Interfaces for more kinds of catalogs –Additional measures to deal with TB-level data –Parallelizing other cross-match functions
20
天文信息技术联合实验室 Thank you! We need your help!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.