Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Mines ParisTech / LAL-CNRS-IN2P3 Review Meeting / PetaQCD LAL / Paris-Sud.

Slides:



Advertisements
Similar presentations
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Lect.3 Modeling in The Time Domain Basil Hamed
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Wavelength Assignment in Optical Network Design Team 6: Lisa Zhang (Mentor) Brendan Farrell, Yi Huang, Mark Iwen, Ting Wang, Jintong Zheng Progress Report.
MATH 685/ CSI 700/ OR 682 Lecture Notes
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Mar Numerical approach for large-scale Eigenvalue problems 1 Definition Why do we study it ? Is the Behavior system based or nodal based? What are.
3D Geometry for Computer Graphics
VLSI Systems--Spring 2009 Introduction: --syllabus; goals --schedule --project --student survey, group formation.
Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
“Both Toffoli and CNOT need little help to do universal QC” (following a paper by the same title by Yaoyun Shi) paper.
3D Geometry for Computer Graphics
Backtracking Reading Material: Chapter 13, Sections 1, 2, 4, and 5.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Introduction to Computer Technology
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Experimental Design. Learning Targets I can… Identify the three types of variables in an experiment Identify quantitative and qualitative data Decide.
SVD(Singular Value Decomposition) and Its Applications
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
Microprocessor-based systems Curse 7 Memory hierarchies.
Fast Low-Frequency Impedance Extraction using a Volumetric 3D Integral Formulation A.MAFFUCCI, A. TAMBURRINO, S. VENTRE, F. VILLONE EURATOM/ENEA/CREATE.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 4.
Automatic LQCD Code Generation Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011 Saint-Hippolyte (France) Claude Tadonki.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Design methodologies.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
A study of relations between activity centers of the climatic system and high-risk regions Vladimir Penenko & Elena Tsvetova.
SVD: Singular Value Decomposition
On Distinguishing the Multiple Radio Paths in RSS-based Ranging Dian Zhang, Yunhuai Liu, Xiaonan Guo, Min Gao and Lionel M. Ni College of Software, Shenzhen.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Signal Processing and Representation Theory Lecture 2.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
A Convergent Solution to Tensor Subspace Learning.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Mathematical Tools of Quantum Mechanics
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
بسم الله الرحمن الرحيم MEMORY AND I/O.
ANR Meeting / PetaQCD LAL / Paris-Sud University, May 10-11, 2010.
ME451 Kinematics and Dynamics of Machine Systems Newton-Raphson Method 4.5 October 28, 2010 © Dan Negrut, 2010 ME451, UW-Madison TexPoint fonts used in.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Finding Rightmost Eigenvalues of Large, Sparse, Nonsymmetric Parameterized Eigenvalue Problems Minghao Wu AMSC Program Advisor: Dr. Howard.
Operating System Structure
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction to Algorithms
Linear Transformations
Neuro-Computing Lecture 4 Radial Basis Function Network
Presentation transcript:

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Mines ParisTech / LAL-CNRS-IN2P3 Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 PetaQCD

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki PRELIMINARY REMARKS We are involved with (very) large-scale linear systems with an implicit principal matrix. Because of the implicit form of the Dirac matrix, we only use iterative methods. For some parameters (e.g. light masses), the Dirac matrix has a poor condition number. deflation method One way to tackle ill-conditioned cases is to use the so-called deflation method. deflation methodlow modes The deflation method attempts to « isolate » the low modes space. deflation methodtwo stages The deflation method is a two stages iterative method. In order to be efficient, both from the qualitative and the quantitative point of view, the deflation method needs a good calibration and lot of programming efforts. Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki TECHNICAL OBSERVATIONS The need to approximate the (nearly) kernel subspace. Our approximation of the low modes space concerns both the dimension and the basis. An (over/under)dimensioned approximation will lead to degrade the global performance. The quality of the «low» eigenvectors approximations is also concerned (although the common belief that this is not important). The second stage subsystem is also solved iteratively. The number of iterations of global system and that of the inner system are correlated. With large gauge configurations and low masses, we have a huge number of iterations and each iteration is costly, thus we are facing a large-scale computation problem. Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki APPROXIMATION OF THE LOW MODES SUBSPACE (also called DEFLATION SUBSPACE) The dimension of the deflation subspace is given as a user parameter (N s ). We choose to compute each of the N s vectors by targeting a real eigenvector (heavy). It is expected that the more the deflation subspace is accurate the shorter will be the path to the solution of the global system. This is true in theory, but the reality is versatile. system is largeextend the deflation basis at a reasonable cost Since our system is large, we need a way to extend the deflation basis at a reasonable cost. extend the deflation subspace block projections We choose to extend the deflation subspace by means of canonical block projections. The computation of the deflation subspace basis (i.e. N s approximated eigenvectors) is severely costly. One argument commonly admitted to pay the price for a good deflation subspace is that this will serve for several inversions. Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki BLOCK DECOMPOSITION AND THE LITTLE DIRAC Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 TECHNICAL REMARKS ABOUT BLOCK DECOMPOSITION It was performed only on the z axis (i.e. b = (0, 0, 0, k) ). The number of blocks was fixed (and hard coded) to 2. Initial status of the block decomposition Blocks parameters apply on one computing node, so the total number of blocks is proportional to the number of processors. We cannot increase the number of blocks as desired, unless we use more processors. More processors means more the number of blocks, which is not necessarily wanted. The consequences of the two previous statements are We are missing other decomposition possibilities. We need a generic block decomposition, means multidimensional and with no restriction on the number of blocks per axis.

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 OUR WORK ON THE BLOCK DECOMPOSITION We have implemented a generic block decomposition (multidimensional and with no restriction on the number of blocks per axis) within the tmLQCD package. The input parameters list and associated I/O routines. The initialization routines. For this purpose, we adapted the following modules The extension of the initial pseudo-eigenvectors using block decomposition. The preparation of the little Dirac components. The checking routines (to make sure we have good projectors). The computation of the little Dirac and its product by a vector. MPI communication according to the decomposition. Mechanism to store (and later read) the generated intial pseudo-eigenvectors.

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 EXPERIMENTAL RESULTS

Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 PERSPECTIVES Having a block decomposition not correlated with the number of computing nodes. Try other forms of decomposition (not necessarily canonical projections). We think about the following perspectives Consider direct approaches for solving the little Dirac, since it is a constant matrix. Try a factorization approach for the little Dirac (LU for instance). Study the possibility for a dynamical reconfiguration of the deflation parameters. Extend the timing (and memory) profile report to include the part from the deflation. Examine carefully the memory footprint of the deflation part. Se what we can really do for the deflation subspace generation.

Strengthening deflation implementation for large scale LQCD inversions Review Meeting / PetaQCD LAL / Paris-Sud University, September 27-28, 2012 Claude Tadonki ?