Next: Memory Management and Organization
Up: Research Subfields
Previous: Research Subfields
Index: Contents Page


Processor architecture

Branch instructions . Branch instructions impose a heavy limit to the exploitation of the instruction-level parallelism in pipelined processors. We have proposed a mechanism for the effective execution of branch instructions. This mechanism consists of executing branch instructions in parallel with the rest of the instructions, therefore branch instructions require no additional time.

  • Antonio González. Design and Evaluation of an Instruction Cache for Reducing the Cost of Branches. Performance Evaluation , vol. 20, no. 1-3, pp. 83-96, May 1994.

  • Antonio González. A Survey of Branch Techniques in Pipelined Processors. Microprocessing and Microprogramming , vol. 36, no. 5, pp. 243-257, September 1993.

  • Antonio González and José M. Llabería. Reducing Branch Delay to Zero in Pipelined Processors. IEEE Transactions on Computers , vol. 42, no. 3, pp. 363-371, March 1993.

  • Antonio González and José M. Llabería. Instruction Fetch Unit for Parallel Execution of Branch Instruction. In Proccedings of the 3rd ACM International Conference on Supercomputing (ICS'89) , pp. 417-426, Creta (Greece), June 1989.

  • Antonio González, José M. Llabería, and Jordi Cortadella. A Mechanism for Reducing the Cost of Branches in RISC Architectures. Microprocessing and Microprogramming , vol. 24, no. 1-5, pp. 565-572, August 1988.

  • Antonio González, José M. Llabería, and Jordi Cortadella. Zero-Delay Cost Branches in RISC Architectures. In Proceedings of the IASTED International Symposium on Applied Informatics , pp. 24-27, Grindelwald (Suiza), February 1988.

Vector Processors . Our research on vector architectures has focused on the memorry pipeline utilization problem. We started by developing a set ot tracing and simulation tools able to accurately extract performance data out of vectorized programs. This data clearly showed that the latency tolerance properties of vector architectures is not as good as expected. Therefore, our research focused on three different techniques aimed at improving vector performance:

  • decoupling
  • out-of-order execution and register renaming
  • multithreading

The first two are targeted at improving single application performance while the third is targeted at a throughput improvement. We have looked at decoupled vector architectures and have shown that they can greatly improve performance and that tolerate main memory latency much better than a traditional vector architecture. We have also looked at out-of-order execution and register renaming for vector machines. Again, the performance advantage was substantial and the technique did also allow for toleration of large memory latencies. Moreover, register renaming allowed us to introduce precise exceptions in a vector machine, which, in turn, allows for the easy implementation of virtual memory. The mutlithreaded technique executes several independent vector programs on a machine having four sets of vector registers and sharing all other functional units. The performance of this type of architecture is also much better than a traditional vector machine and the technique is able to almost saturate the mmemory port.

  • Roger Espasa and Mateo Valero. Multithreaded Vector Architectures. In Proceedings of the Third International Symposium on High Performance Computer Architecture (HPCA'97) , San Antonio, TX (USA), February 1997.

  • Roger Espasa and Mateo Valero. Decoupled Vector Architectures. In Proceedings of the Second International Symposium on High Performance Computer Architecture (HPCA'96) , pp. 281-290, San José, CA (USA), February 1996.

Next: Memory Management and Organization
Up: Research Subfields
Previous: Research Subfields
Index: Contents Page


 

Home | Presentation | Studies | Research | Research Centers | News Top

Last update: February 2, 2001
Copyright © 2000-2005 Departament d'Arquitectura de Computadors