Download presentation
Presentation is loading. Please wait.
1
PhD Prelim Oral Exam Parallelizing Eigen-Value Computation Based Exact Spatial Auto-Regression Model Solution Baris M. Kazar Advisors: Dr. Shashi Shekhar, Dr. David J. Lilja AHPCRC, Dept. of Electrical & Computer Eng’g. University of Minnesota kazar@ece.umn.edu http://www.cs.umn.edu/~kazar
2
09/20/2004Parallelizing Exact SAR Model Solution Biography Education –M.S., Electrical and Computer Engineering UMN-TC, 2000 –Took WPE, Started PhD Thesis –Ph.D. Candidate, Electrical and Computer Engineering UMN-TC (expected 2005) Research Interests –Data and knowledge engineering, spatial database management, spatial data mining, parallel processing, geographic information systems, and spatial statistics. 2/33
3
09/20/2004Parallelizing Exact SAR Model Solution Biography Publications High Performance Spatial Data-Mining, B. M. Kazar, S. Shekhar and D. J. Lilja, AHPCRC Tech Report no. 2003-125 (Poster at ACM SIGPLAN Principles and Practice of Parallel Programming (PPoPP) Conference, June 2003.) A Parallel Formulation of the Spatial Auto-Regression Model for Mining Large Geo-Spatial Databases, B.M. Kazar, S.Shekhar, D.J. Lilja, and D. Boley, AHPCRC Tech Report no. 2004-103, March, 2004. (International Workshop on High Performance and Distributed Mining (HPDM) at SIAM Data Mining Conference, April, 2004.) Comparing Exact and Approximate Spatial Auto- Regression Model Solutions for Spatial Data Analysis, B.M. Kazar, S. Shekhar, D. J. Lilja, K. Pace, AHPCRC Tech Report no. 2004-126 (GIScience 2004 Conference, October, 2004) Scalable Parallel Approximate Formulations of Multi- Dimensional Spatial Auto-Regression Models for Spatial Data Mining, S. Shekhar, B. M. Kazar, D. J. Lilja, will appear as summary paper in 24 th Army Science Conference 3/33
4
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) –Motivation –Spatial Autocorrelation (SA) –SDM Provides Better Model –But, Computation Costs are Much Higher –Computational Challenge Problem Definition Our Approach Algebraic Analysis Experimental Results Summary & Future Work
5
09/20/2004Parallelizing Exact SAR Model Solution Motivation Widespread use of spatial databases Mining spatial patterns The 1855 Asiatic Cholera on London [Griffith] Fair Landing [NYT, R. Nader] Correlation of bank locations with loan activity in poor neighborhoods Retail Outlets [NYT, Walmart, McDonald etc.] Determining locations of stores by relating neighborhood maps with customer databases Crime Hot Spot Analysis [NYT, NIJ CML] Explaining clusters of sexual assaults by locating addresses of sex-offenders Ecology [Uygar] Explaining location of bird nests based on structural environmental variables 4/33
6
09/20/2004Parallelizing Exact SAR Model Solution Spatial Auto-correlation (SA) Random Distributed Data (no SA): Spatial distribution satisfying assumptions of classical data Cluster Distributed Data: Spatial distribution NOT satisfying assumptions of classical data Pixel property with independent identical distribution Random Nest Locations Pixel property with spatial auto- correlation Cluster Nest Locations 5/33
7
09/20/2004Parallelizing Exact SAR Model Solution Linear Regression → SAR Spatial auto-regression (SAR) model has higher accuracy and removes IID assumption of linear regression SDM Provides Better Model! 6/33
8
09/20/2004Parallelizing Exact SAR Model Solution But, Computation Costs are Much Higher Linear regression takes 2 seconds for 10000 problem size on IBM Regatta Stage A is the bottleneck & Stage B and C contribute very small to response time 7/33
9
09/20/2004Parallelizing Exact SAR Model Solution Computational Challenge Maximum-Likelihood Estimation = MINimizing the Function Solving SAR Model – = 0 → Least Squares Problem – = 0, = 0 → Eigen-value Problem –General case: → Computationally expensive due to the log-det term in the ML Function Log-det term 8/33
10
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) Problem Definition –Key Concept: Neighborhood Matrix ( W ) –Related Work –Our Contributions Our Approach Algebraic Analysis Experimental Results Summary & Future Work
11
09/20/2004Parallelizing Exact SAR Model Solution Problem Definition Given: A Sequential solution procedure: “Serial Dense Matrix Approach” for one-dimensional geo-spaces [Li,1996] Find: Parallel Formulations for multi-dimensional geo-spaces Objective: Scalable and efficient software Maximize speedup (T serial /T parallel ) Constraints: Size of W (large vs. small and dense vs. sparse) N(0, 2 I) IID Reasonably efficient parallel implementation in multi-dimensional geo-spaces Parallel Platform Memory limitations 9/33
12
09/20/2004Parallelizing Exact SAR Model Solution Key Concept: Neighborhood Matrix ( W ) W allows other neighborhood definitions distance-based 8-neighbors Space + 4-neighborhood 6 th row Binary W 6 th row Row-normalized W Given: Spatial framework Attributes 10/33
13
09/20/2004Parallelizing Exact SAR Model Solution Related Work Related work: Li,1996 –Solved 1-D problem –Used CMSSL linear algebra library on CM-5 supercomputers, which are not available for use anymore Limitations: –Not applicable to 2-D,3-D geo-spaces –Not portable 11/33
14
09/20/2004Parallelizing Exact SAR Model Solution Our Contributions Parallel solutions for 2-D, 3-D (multi-dimensional) very large problems Scalable and efficient software –Fortran 77 –An Application of Hybrid Parallelism (MPI & OpenMP) By the final exam we will also have: Other alternative solutions for SAR Determining which solution dominates when Ranking of solutions with respect to: –ρ and β scaling which affects accuracy –Computational complexity –Memory requirement 12/33
15
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) Problem Definition Our Approach –Dimensions of Design Space & Details Implementation Platform Implemented Algorithm & Operation Count Parallel Formulation Load-Balancing Algebraic Analysis Experimental Results Summary & Future Work
16
09/20/2004Parallelizing Exact SAR Model Solution Our Approach Parallel Formulations of SAR Model Solutions for multi-dimensional (2-D and 3-D) geo-spaces Scalability and efficiency Design space dimensions: –Implementation Platform –Implemented (Four) Algorithms –Parallel Formulation –Load-Balancing 13/33
17
09/20/2004Parallelizing Exact SAR Model Solution Implementation Platform Four options for this dimension: C with OpenMP API C++ with OpenMP API Java with OpenMP API Fortran 77/90/95 with OpenMP API – Column-major programming language – Thinking in terms of vectors 14/33
18
09/20/2004Parallelizing Exact SAR Model Solution Implemented Algorithm Compute Eigen-values (Stage A ) due to Produces dense W neighborhood matrix, Forms synthetic data y (optional) Makes W symmetric Householder transformation Convert dense symmetric matrix to tri-diagonal matrix QL Transformation Compute all eigen-values of tri-diagonal matrix B Golden Section Search Calculate ML Function A Compute Eigenvalues C Least Squares Eigen-values of W 15/33
19
09/20/2004Parallelizing Exact SAR Model Solution Operation Counts for Exact SAR Model Soln Householder Transformation i.e., reduction to tri-diagonal form is the most complex operation Computation Cost Communication Cost 16/33
20
09/20/2004Parallelizing Exact SAR Model Solution Parallel Formulation Function Partitioning: Each processor works on the same data with different instructions Data partitioning: Each processor works on different data with the same instructions Parallel RunSerial Run Function 1 Function 2 Function 3 Function 1Function 2Function 3 DS1DS2DS 3 PE #1PE #2 PE #3 DS1DS2DS 3 Function 1 DS1 DS2 DS3 Function 1 PE #1 PE #2PE #3 DS1DS2DS3 17/33
21
09/20/2004Parallelizing Exact SAR Model Solution Data Parallel Formulation Allows finer granularity parallelism i.e. loop-level parallelism Can be implemented by dividing data into chunks: –Column-wise (for column-major programming languages) –Row-wise (for row-major programming languages) –Checker-board-wise Loop parallelization Multiple loops in each stage Each box handles a set of columns Timing each thread in the parallel region –to see load imbalance via guide77 tool on IBM Regatta Start Program Computing Eigenvalues Stage 1 Golden Section Search Stage 2 Least Squares Stage 3 End Program Synchronization Points Serial Region Parallel Region 18/33
22
09/20/2004Parallelizing Exact SAR Model Solution Load-Balancing Techniques The chunk size B << n/p such as 4, 8 and 16 in our study Dynamic: Threads are assigned chunks on a “first-come, first-do” basis Affinity: Two levels of chunks and threads may execute other threads’ partitioning (chunk stealing) 19/33
23
09/20/2004Parallelizing Exact SAR Model Solution Load-Balancing: Which Data to Partition? Candidates: y, W, x, β, ε W is partitioned across processors = ++ n-by-1 n-by-n 1-by-1 n-by-k k-by-1 n-by-1 20/33
24
09/20/2004Parallelizing Exact SAR Model Solution Load-Balancing: How to Partition? (Small Scale Example) 4 processors are used and chunk size can be determined by the user W is 16-by-16 and partitioned across processors P1- ( 40 vs. 58 ) P2- (36 vs. 42) P3- (32 vs. 26) P4- ( 28 vs. 10 ) P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 P1 P2 P3 P4 Round-robin with chunk size 1 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 2 1 00 2 1 00000000000 3 1 0 3 1 00 3 1 0000000000 0 3 1 0 3 1 00 3 1 000000000 00 2 1 0000 2 1 00000000 3 1 0000 3 1 00 3 1 0000000 0 4 1 00 4 1 0 4 1 00 4 1 000000 00 4 1 00 4 1 0 4 1 00 4 1 00000 000 3 1 00 3 1 0000 3 1 0000 0000 3 1 0000 3 1 00 3 1 000 00000 4 1 00 4 1 0 4 1 00 4 1 00 000000 4 1 00 4 1 0 4 1 00 4 1 0 0000000 3 1 00 3 1 0000 3 1 00000000 2 1 0000 2 1 00 000000000 3 1 00 3 1 0 3 1 0 0000000000 3 1 00 3 1 0 3 1 00000000000 2 1 00 2 1 0 Lower Half is Used Contiguous 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 P1 P2 P3 P4 Lower Half is Used 21/33
25
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) Problem Definition Our Approach Algebraic Analysis –Cost Model –Ranking of Load-Balancing Techniques Experimental Results Summary & Future Work
26
09/20/2004Parallelizing Exact SAR Model Solution Algebraic Cost Model for Exact SAR Model ξ static >> ξ static-rr > ξ dynamic > ξ affinity Serial Region SLB Process Local Work Process Local Work … DLB SLB Serial Region 22/33
27
09/20/2004Parallelizing Exact SAR Model Solution Ranking of Load-Balancing Techniques round- robin contiguous guided affinity dynamic medium poor Synchr. cost highlow good Load Balance The experimental work for the parallel exact SAR model which took ~ 2 CPU years and the algebraic cost model agree with each other. best worst Load imbalance Synchronization cost Round-robin Contiguous Affinity Dynamic Guided Partial Ranking Total Ranking 23/33
28
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) Problem Definition Our Approach Algebraic Analysis Experiment Design & Results –Goals & Experiment Design Questions –Experimental Design-1 with Synthetic Dataset –Experimental Design-2 with Real Dataset –Summary of Experiment Results Summary & Future Work
29
09/20/2004Parallelizing Exact SAR Model Solution Goals Hypothesis: The new parallel formulations proposed in this study will outperform the previous parallel implementation in terms of: – Speedup (S), – Scalability and efficiency, – Problem size (PS) and, – Memory requirement Experiment Design answers: 1.Which load-balancing method provides best speedup? 2.How does problem-size impact speedup? 3.How does chunk-size affect speedup? 4.How does # of processors affect speedup? 5.Which data-partitioning method provides best speedup? 24/33
30
09/20/2004Parallelizing Exact SAR Model Solution Experimental Design Evaluate Model Learning Data Testing Data ρ=0.203 β = 3.75 Synthetic Datasets with sizes: 2500 6400 10K Analyze/Summarize Performance Data Model Quality (accuracy) for Learning Data only Performance Measurement on IBM Regatta w/ 47.5 GB Main Memory; 32 1.3 GHz Power4 processors Parallel (OpenMP+MPI in f77) Eigen-value Based Exact SAR Model Solution Load-Balancing Techniques 2-D w/ 4-neighbors Predicted Model Parameters ρ = 0.2033 β = 3.7485 Generator/ Splitter Build Model 25/33
31
09/20/2004Parallelizing Exact SAR Model Solution Experimental Results–Effect of Load Balancing Fixed: PS to 10000 Best results for each class of load-balancing technique are presented Affinity resulted in the best speedup & efficiency Guided resulted in the worst speedup & efficiency 26/33
32
09/20/2004Parallelizing Exact SAR Model Solution Experimental Results- Effect of Problem Size Fixed: 8 processors Load-balancing technique to affinity Interesting trend for affinity with chunk size 1 Memory limitations 27/33
33
09/20/2004Parallelizing Exact SAR Model Solution Experimental Results- Effect of Chunk Size Fixed: 8 processors Critical value of the chunk size for which the speedup reaches the maximum. This value is higher for dynamic scheduling to compensate for the scheduling overhead. The workload is more evenly distributed at critical chunk size value. 28/33
34
09/20/2004Parallelizing Exact SAR Model Solution Experimental Results- Effect of # of Processors Fixed: PS to 10000 Average speedup across all scheduling techniques is 3.43 for the 4-processor case and 5.91 for the 8-processor case Affinity scheduling shows the best speedup, on average 7 times on 8 processors 29/33
35
09/20/2004Parallelizing Exact SAR Model Solution Problem Size 10000 on 16 PEs Serial Times (sec) mixed:2885.4 cont: 2917.1 rr04: 3515.1 dyn1: 3401.7 dyn4: 3326.2 afc4: 3246.9 gui4: 3291.1 gui8: 3198.6 30/33
36
09/20/2004Parallelizing Exact SAR Model Solution Summary of Results Speed-ups –Best schemes achieve a speed-up of about 10.5 for 16 processors –An order of magnitude improvement over serial solutions Efficiency –0.6 for best load-balancing scheme with 16 PEs –0.93 for best load-balancing scheme with 8 PEs –We did not use machine specific optimizations –Trade-off between portability and efficiency Speed-up Ranking –Affinity with chunk size 1 gives best speedup –Static round-robin with chunk size 4 is next –Contiguous and Guided schemes provide least speedup –Contiguous fails due to non-uniform workload –Guided and dynamic round robin have higher run-time cost 31/33
37
09/20/2004Parallelizing Exact SAR Model Solution Outline Spatial Data-Mining (SDM) Problem Definition Our Approach Algebraic Analysis Experimental Results Summary & Future Work –Future Work –Acknowledgments
38
09/20/2004Parallelizing Exact SAR Model Solution Future Work Efficiency –Identify reasons of inefficiency for larger # of PEs and fix them –Sparse eigen-value computation needed Hybrid implementations to use more processors Other alternative solutions for SAR –Scaling exact SAR model solution by applying direct sparse algorithms such as Sparse LU Decomposition Determining the conditions when a solution dominates i.e., ranking of all SAR solutions –ρ and β scaling which affects accuracy –Computational complexity –Memory requirement Response of solutions to different inputs i.e., visualization of SAR Model Solutions by varying: –Degree of auto-correlation –Regression coefficients 32/33
39
09/20/2004Parallelizing Exact SAR Model Solution Acknowledgments AHPCRC Minnesota Supercomputing Institute Spatial Database Group Members ARCTiC Labs Group Members Dr. Dan Boley Dr. Sanjay Chawla Dr. Vipin Kumar Dr. James LeSage Dr. Kelley Pace Dr. Paul Schrater Dr. Pen-Chung Yew THANK YOU VERY MUCH Questions? 33/33
40
09/20/2004Parallelizing Exact SAR Model Solution Operation Counts for Exact SAR Model Soln
41
09/20/2004Parallelizing Exact SAR Model Solution A : Contiguous for rectangular loops & round-robin with chunk-size 4 B : Contiguous C : Contiguous The arrows are also synchronization points for parallel solution A B C There are synchronization points within the boxes as well Data Partitioning & Synchronization B Golden Section Search Calculate ML Function A Compute Eigenvalues C Least Squares Eigen-values of W
42
09/20/2004Parallelizing Exact SAR Model Solution Portability To show portability run scripts from different machines are illustrated SGI origin –Compilation: f77 -64 -O3 –mp.f –Run: time./a.out IBM SP –Compilation: xlf_r -O3 -qstrict -q64 -qsmp=omp.f –Run: time./a.out IBM Regatta –Compilation: xlf_r -O3 -qstrict -q64 -qsmp=omp.f –Run: time./a.out SGI Altix (OpenMP has got problems in terms of speedup) –Compilation: efc -O3 -openmp –Vaxlib.f –Run: dplace –x6./a.out
43
09/20/2004Parallelizing Exact SAR Model Solution Discussion Hard to compute all of the eigenvalues of a dense/sparse matrix both in serial and in parallel –G. H. Golub, H. A. van der Vorst, “Numerical Progress in Eigenvalue Computation in the 20 th Century”, Working Document –J. W. Demmel, “Trading off Parallelism and Numerical Stability”, CRPC-TR92422, 1992, Center for Research on Parallel Computation, Rice University NAG Libraries is LAPACK-based –Our eigenvalue solver scales similar/better with respect to NAG routines 99% of total serial response time is spent for computing eigenvalues
44
09/20/2004Parallelizing Exact SAR Model Solution Long-term Future Work Spatial Outliers Markov Random Fields (MRF-BC) Spatially Aware Wavelets (SAW): –Using wavelets for spatial prediction –Incorporating an auto-correlation parameter into the wavelet representation of the observed variable y. –Express the log-det term in terms of wavelets Spatial Co-location Mining A HP-SDM Toolbox
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.