Next: Algorithms and Architectures for Linear Algebra and Sparse Matrix computations
Up: Research Subfields
Previous: Instruction-Level Parallelism
Index: Contents Page


Parallelization, Data Distribution and Loop Transformations

In this area, the group first focused on the exploitation of fine-grain parallelism in loops with tight recurrences for both shared and distributed-memory multiprocessor systems. The static estimation of loop performance metrics and the implementation of our proposals (Parafrase-2 and Alliant FX- 8) have also been tackled. The most relevant publications are the following:

  • Cristina Barrado, Jesús Labarta, Eduard Ayguadé, and Mateo Valero. Generation a Periodic Pattern for VLIW. In 5th Workshop on Compilers for Parallel Computers , pp. 485-502, Málaga (Spain), June 1995.

  • Cristina Barrado, Jesús Labarta, Eduard Ayguadé, and Mateo Valero. Automatic Generation of Loop Scheduling for VLIW. In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT'95) , pp. 306-309, Limassol (Cyprus), June 1995.

  • Cristina Barrado, Jesús Labarta, and Patricia Borensztejn. Implementation of GTS. In Proceedings of Parallel Architectures and Languages Europe (PARLE'94) , pp. 565-576, Athens (Grece), June 1994. Lecture Notes in Computer Science #817.

  • Cristina Barrado, Jesús Labarta, and Eduard Ayguadé. An Efficient Scheduling of DOACROSS Loops. In Proceedings of Parallel and Distributed Computing and Systems , pp. 303-307, Washington DC (USA), October 1994.

  • Patricia Borensztejn, Jesús Labarta, and Cristina Barrado. Measures of Parallelism at Compile Time. In Euromicro Workshop on Parallel and Distributed Processing , pp. 253-258, Gran Canaria (Spain), January 1993.

  • Jesús Labarta, Eduard Ayguadé, Jordi Torres, Mateo Valero, and José M. Llabería. Balanced Loop Partitioning Using GTS , chapter 19, pp. 298-312. Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science #589. Springer-Verlag, 1992.

  • Eduard Ayguadé, Jesús Labarta, Jordi Torres, José M. Llabería, and Mateo Valero. Parallelism Evaluation and Partitioning of Nested Loops for Shared Memory Multiprocessors , chapter 11, pp. 220-242. Advances in Languages and Compilers for Parallel Computing, Research Monographs in Parallel and Distributed Computing. Pitman/MIT Publishers, 1991.

  • Eduard Ayguadé, Jesús Labarta, Jordi Torres, José M. Llabería, and Mateo Valero. Nested-Loop Partitioning for Shared-Memory Multiprocessor Systems. In 2nd Workshop on Compilers for Parallel Computers , pp. 378-385, Paris (France), December 1990.

More recently, the group focused on the automatic data distribution and loop parallelization for both distributed and coherent cache-based shared memory multiprocessor systems (like Convex Exemplar or SGI Power Challenge). The analysis is done based on an estimation of the trade-offs between data movement and reduction of execution time due to parallel execution. DDT (Data Distribution Tool, on top of ParaScope) is the result of this research and generates directives either from the HPF, PFA from Silicon or Exemplar from Convex programming models. The most relevant publications are the following:

  • Jordi Garcia, Eduard Ayguadé, and Jesús Labarta. Using a 0-1 Integer Programming Model for Automatic Static Data Distribution. Parallel Processing Letters , vol. 6, no. 1, pp. 159-171, 1996.

  • Eduard Ayguadé, Jesús Labarta, Jordi Garcia, Merce Girones, and Mateo Valero. Analyzing Reference Patterns in Automatic Data Distribution Tools. International Journal of Parallel Programming , vol. 23, no. 6, pp. 515-535, December 1995.

  • Jordi Garcia, Eduard Ayguadé, and Jesús Labarta. A Novel Approach towards Automatic Data Distribution . In Supercomputing'95 , San Diego (USA), December 1995. Also presented at Workshop on Automatic Data Layout and Performance Prediction, Rice University, April 1995.

  • Eduard Ayguadé, Jordi Garcia, Merce Girones, M. Luz Grande, and Jesús Labarta. Data Redistribution in an Automatic Data Distribution Tool . In 8th International Workshop on Programming Languages and Compilers for Parallel Computing , pp. 271-287, Columbus, Ohio (USA), August 1995. Lecture Notes in Computer Science #1033.

  • Eduard Ayguadé, Jordi Garcia, Merce Girones, Jesús Labarta, Jordi Torres, and Mateo Valero. Detecting and Using Affinity in an Automatic Data Distribution Tool . In 7th International Workshop on Programming Languages and Compilers for Parallel Computing , pp. 61-75, Ithaca, NY (USA), August 1994. Lecture Notes in Computer Science #892.

  • Eduard Ayguadé, Jesús Labarta, Jordi Garcia, Merce Girones, and Mateo Valero. Detecting Affinity for Automatic Data Distribution . In 2nd International Workshop on Massive Parallelism: hardware, Software and Applications , pp. 32-46, Capri (Italy), October 1994.

  • Eduard Ayguadé, Jesús Labarta, Jordi Garcia, Merce Girones, and Mateo Valero. A Study of Data Sets and Affinity in the Perfect Club. In 4th Workshop on Compilers for Parallel Computers , Delft (The Netherlands), December 1994.

We have also proposed a methodology based on linear loop transformations to systematically transform sequential loops in order to enable their parallelization. The methodology is targeted towards distributed memory multicomputers starting from the sequential program.

  • Agustín Fernández, José M. Llabería, Juan J. Navarro, and Miguel Valero-García. Transformation of Systolic Algorithms for Interleaving Partitions. In Application Specific Array Processors (ASAP'91) , pp. 56-71, Platja d'Aro (Spain), September 1991.

  • Agustín Fernández, José M. Llabería, Juan J. Navarro, and Miguel Valero-García. Interleaving Partitions of Systolic Algorithms for Programming Distributed Memory Multiprocessors. In The 2nd European Distributed Memory Computing Conference EDMCC2 , pp. 90-99, Munich (Germany), April 1991. Lecture Notes in Computer Science #487.

The use of different linear loop transformations oriented towards the exploitation of parallelism for shared memory multiprocessors has also been subject of research. Either the use of an alignment component or a different unimodular transformation for each statement in the loop body have been the main contributions of this work, published in the following papers:

  • Eduard Ayguadé, Peter Knijnenburg, and Jordi Torres. Multi-Transformations: Definition and Usefulness. In XV International Conference of the Chilean Computer Science Society , pp. 37-47, Arica (Chile), November 1995.

  • Jordi Torres, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. Loop Parallelization: Revisiting Framework of Linear Loop Transformations. In Proceedings of the Parallel and Distributed Processing Symposium , pp. 420-427, Braga (Portugal), January 1996. Also in 5th Workshop on Compilers for Parallel Computers, Málaga (Spain), June 1995.

  • Eduard Ayguadé and Jordi Torres. Partitioning the Statement per Iteration Space Using Non-singular Matrices. In Proccedings of the 7th ACM International Conference on Supercomputing (ICS'93) , pp. 407-415, Tokyo (Japan), July 1993.

  • Jordi Torres, Eduard Ayguadé, Jesús Labarta, and Mateo Valero. ALIGN and DISTRIBUTE-Based Linear Loop Transformations. In 6th International Workshop on Programming Languages and Compilers for Parallel Computing , pp. 321-339, Portland, OR (USA), August 1993. Lecture Notes in Computer Science #768.

Also we are working on a new unified method for simultaneously tiling the register and cache levels of the memory hierarchy. We will focus on the code transformation phase of tiling. Our algorithm uses strip-mining and loop interchange on all memory hierarchy levels to determine the tiles as usual, and, afterwards, and due to the special characteristics of the register level, we apply index set splitting, fully unrolling and unnecessary load/store elimination. We propose a technique to perform the loop interchange in non-convex iteration spaces that computes the loop bounds exactly and we also present an order in which to perform index set splitting that guaranties that each loop in the nest will be processed only once and also avoids code explosion.

The performance of the memory hierarchy obtained with our method is substantially higher than the performance obtained by commercial preprocessors capable of restructuring code to exploit the memory hierarchy. The main contributions of this work are published in the following papers:


Next: Algorithms and Architectures for Linear Algebra and Sparse Matrix computations
Up: Research Subfields
Previous: Instruction-Level Parallelism
Index: Contents Page


 

Inici | Presentació | Docència | Recerca | Centres de Recerca | Novetats Inici

Darrera actualització: 12 de novembre del 2004
Copyright © 2000-2005 Departament d'Arquitectura de Computadors