Next: Microkernel Technology Support to Parallel Applications
Up: Research Subfields
Previous: ParallelizationData Distribution and Loop Transformations
Index: Contents Page


Algorithms and Architectures for Linear Algebra and Sparse Matrix computations

We are working in multilevel block algorithms applied to dense and general sparse matrices. First of all, it has been studied, proposed, developed and evaluated multilevel block algorithms. The main goal of multilevel block algorithms is to optimize the locality exploitation of data in linear algebra operations. The target architectures of multilevel block algorithms are those with several levels in the memory hierarchy. The different forms of the algorithms have been classified and we have proved that one of them, that we call MOB, is optimal in performance and can be easily designed. The performance improvements obtained with the MOB algorithms are outstanding.

In the design of the multilevel block algorithms, in addition to the conflicts that appear in cache lines, we consider TLB misses and page faults. The evaluation of MOB algorithms is done theoretically on a memory hierarchy model and is validated with trace-driven simulations and experimental measurements on two new computers that use high performance microprocessors.

  • Juan J. Navarro, Elena García, Josep-L. Larriba-Pey, and Toni Juan. Block Algorithms for Sparse Matrix Computations on High Performance Workstations. In Proccedings of the 10th ACM International Conference on Supercomputing (ICS'96) , pp. 301-308, Philadelphia (USA), May 1996.

  • Juan J. Navarro, Toni Juan, and Tomás Lang. MOB Forms: A Class of Multilevel Block Algorithms for Dense Linear Algebra Operations. In Proccedings of the 8th ACM International Conference on Supercomputing (ICS'94) , pp. 325-334, Manchester (United Kingdom), July 1994.

  • Juan J. Navarro, Toni Juan, Mateo Valero, José M. Llabería, and Tomás Lang. Multilevel Orthogonal Blocking for Dense Linear Algebra Computations. IEEE Technical Committee on Computer Architecture Newsletter , pp. 10-14, 1993.

For the case of sparse linear systems, parallel algorithms have been implemented.

  • José M. Cela, Jesús Labarta, and Juan J. Navarro. Performance on Distributed Memory Multicomputers of Domain Decomposition Solvers. In 7th SIAM Conference on Parallel Processing for Scientific Computing , pp. 391-392, San Francisco (USA), February 1995.

  • José M. Cela, Albert Pujolar, and Juan J. Navarro. Standard PDEs Solvers vs. Domain Decomposition Solvers on Vector Multiprocessors. In Proceedings of the IASTED International Symposium on Applied Informatics , pp. 230-235, Annecy (France), May 1993.

The basic approach of this piece of work is to extract the coarse grain parallelism that those applications have. In order to do so, domain decomposition techniques have been used and a message passing programming model on a distributed memory multicomputer have been assumed. All this work is related to project ESPRIT-6753 (IDENTIFY) in collaboration with the British Rutherford Appleton Laboratory and the French company Bertin et Cie. In particular, the developed software can be applied to Fluid Dynamics problems. The role of our research group in the project was to implement the linear solvers for the project. Some of the methods developed are: i) Element by element methods, ii) Iterative methods on the global matrix, iii) Iterative methods on the Schur Complement matrix. In addition, we have also implemented the corresponding software for the assembly of the matrices which is necessary for methods (ii) and (iii). This latter software allows the assembly both from a finite volume discretization and from a finite element discretization.

With the objective of improving the efficiency of the parallel solvers, different reordering algorithms have been included in the code. Those algorithms have the aim of reducing the band and increment the parallelism. Some of those algorithms process elements within submatrices and others process submatrices within the global matrix.

Within the field of sparse linear systems, one can find banded linear systems. This type of problems appears in the discretization of partial differential equations. The size of the band can vary from one only subdiagonal (bidiagonal matrices) to tens of diagonals. The work developed has focussed on three different parts. In a first part, algorithms have been studied for narrow banded systems (bidiagonal and tridiagonal). Those methods have been studied for their execution on vector processors in:

  • Josep-L. Larriba-Pey, Juan J. Navarro, Angel Jorba, and Oriol Roig. Review of general and Toeplitz Bidiagonal Solvers. Parallel Computing , vol. 22, no. 8, pp. 1091-1125, 1996.

  • Josep-L. Larriba-Pey, Juan J. Navarro, Angel Jorba, and Oriol Roig. A generalized vision of some parallel bidiagonal systems solvers. In Proccedings of the 8th ACM International Conference on Supercomputing (ICS'94) , pp. 404-411, Manchester (United Kingdom), July 1994.

In the second part, the methods studied above have been analyzed for strictly diagonal dominant systems of equations. New methods have been proposed for the solution of those systems. The study has a mathematical component to prove the accuracy of the methods and an architectural component to study the methods on specialized architectures:

  • Josep-L. Larriba-Pey, Angel Jorba, and Juan J. Navarro. A generalized criterion for the early termination of R-Cyclic Reduction and Divide and Conquer for recurrences. In 7th SIAM Conference on Parallel Processing for Scientific Computing , pp. 448-453, San Francisco (USA), February 1995.

  • Josep-L. Larriba-Pey, Angel Jorba, and Juan J. Navarro. A Parallel Tridiagonal Solver for Vector Uniprocessors. In 6th SIAM Conference on Parallel Processing for Scientific Computing , pp. 590-597, Norfolk, VA (USA), March 1993.

  • Josep-L. Larriba-Pey, Angel Jorba, and Juan J. Navarro. Overlapped Partitions versus Tricyclic Reduction: Races on the C-3400. In Convex User Group Worlwide Conference , pp. 150-155, Richardson, TX (USA), March 1993.

  • Josep-L. Larriba-Pey, Angel Jorba, and Juan J. Navarro. Spike Algorithm with savings for strictly diagonal dominant tridiagonal systems. Microprocessing and Microprogramming , vol. 39, pp. 125-128, October 1993.

Finally, a study of the algorithms for different applications has been carried out:

  • Josep-L. Larriba-Pey, Juan J. Navarro, and Angel Jorba. Vector and parallel interpolation by Natural Cubic Splines and B-Splines. In Proceedings of the Parallel and Distributed Processing Symposium , pp. 385-392, Braga (Portugal), January 1996.

  • Josep-L. Larriba-Pey, M. Mascagni, Angel Jorba, and Juan J. Navarro. An Analysis of the Parallel Computation of Arbitrarily Branched Cable Neuron Models. In 7th SIAM Conference on Parallel Processing for Scientific Computing , pp. 373-378, San Francisco (USA), February 1995.

Next: Microkernel Technology Support to Parallel Applications
Up: Research Subfields
Previous: ParallelizationData Distribution and Loop Transformations
Index: Contents Page


 

Home | Presentation | Studies | Research | Research Centers | News Top

Last update: November 12, 2004
Copyright © 2000-2005 Departament d'Arquitectura de Computadors