Diego
Andrade Canosa
Profesor Titular de Universidade
Basilio Bernardo
Fraguela Rodríguez
Catedrático de Universidade
Publications by the researcher in collaboration with Basilio Bernardo Fraguela Rodríguez (40)
2024
-
STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning
IEEE Access, Vol. 12, pp. 70581-70599
2023
-
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
-
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
2022
-
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
-
The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2021
-
A software cache autotuning strategy for dataflow computing with UPC++ DepSpawn
Computational and Mathematical Methods
-
High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
Journal of Supercomputing, Vol. 77, Núm. 7, pp. 7676-7689
-
Opencnn: A winograd minimal filtering algorithm implementation in Cuda
Mathematics, Vol. 9, Núm. 17
-
ScalaParBiBit: scaling the binary biclustering in distributed-memory systems
Cluster Computing, Vol. 24, Núm. 3, pp. 2249-2268
2020
-
An automatic optimizer for heterogeneous devices
Future Generation Computer Systems, Vol. 106, pp. 572-584
2019
-
A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library)
IEEE Access, Vol. 7, pp. 23365-23378
-
Easy dataflow programming in clusters with UPC++ DepSpawn
IEEE Transactions on Parallel and Distributed Systems, Vol. 30, Núm. 6, pp. 1267-1282
2018
-
Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
-
Heterogeneous distributed computing based on high-level abstractions
Concurrency Computation
2017
-
Facilitating the development of stencil applications using the Heterogeneous Programming Library
Concurrency Computation , Vol. 29, Núm. 12
-
High productivity multi-device exploitation with the Heterogeneous Programming Library
Journal of Parallel and Distributed Computing, Vol. 101, pp. 51-68
2016
-
Towards a High Level Approach for the Programming of Heterogeneous Clusters
Proceedings of the International Conference on Parallel Processing Workshops
-
Writing a performance-portable matrix multiplication
Parallel Computing, Vol. 52, pp. 65-77
2015
-
Developing adaptive multi-device applications with the Heterogeneous Programming Library
Journal of Supercomputing, Vol. 71, Núm. 6, pp. 2204-2220
-
Improving OpenCL programmability with the Heterogeneous Programming Library
Procedia Computer Science