Manycore Architectures and SIMD Optimizations for High Performance Computing

  1. Horro Varela, Marcos
Supervised by:
  1. Juan Touriño Co-director
  2. Gabriel Rodríguez Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 10 May 2022

Committee:
  1. Javier Díaz Bruguera Chair
  2. Margarita Amor Secretary
  3. Arturo González Escribano Committee member
Department:
  1. Computer Engineering

Type: Thesis

Teseo: 722291 DIALNET lock_openRUC editor

Abstract

For the past fifty years computer architecture has been driven by the ability to etch more transistors onto a single die following the Moore’s Law. This trend changed in the past decade. Parallelism is now a key factor in modern designs: from the hardware side by scaling the number of cores, and from the software side by exploiting the capabilities available in the architecture, emphasizing on the SIMD units. In this Thesis we focus on two orthogonal dimensions: the analysis and optimization of coherence traffic in modern manycores, and the development of SIMD optimizations. For the first dimension we develop static and dynamic techniques for enhancing core-to-data affinity for manycores featuring mesh interconnection networks. Our approach reduces the contention on these meshes by improving data locality according to the physical layout. For the second dimension we develop a source-to-source compiler for vectorizing codes presenting irregular access patterns. We present two main contributions: strategies for gathering non-contiguous memory addresses, and fusing independent reductions. We have developed an SMT-based system to generate alternatives for packing random operands from memory based on the host ISA, and a profiling framework for characterizing platforms. Our evaluation shows promising speedups when applying these SIMD optimizations to those codes.