Journal:Informatica
Volume 17, Issue 4 (2006), pp. 535–550
Abstract
Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms for transposing small and large matrices using the popular symmetric multiprocessors (SMP) architecture, which carries a relatively low communication cost due to its large aggregate bandwidth and low-latency inter-process communication. We conduct analysis on the cost of data sending / receiving and the memory requirement of these matrix-transpose algorithms. We then propose an adaptive algorithm that can minimize the overhead of the matrix transpose operations given the parameters such as the data size, number of processors, start-up time, and the effective communication bandwidth.
Journal:Informatica
Volume 15, Issue 2 (2004), pp. 203–218
Abstract
A quick matrix multiplication algorithm is presented and evaluated on a cluster of networked workstations consisting of Pentium hosts connected together by Ethernet segments. The obtained results confirm the feasibility of using networked workstations to provide fast and low cost solutions to many computationally intensive applications such as large linear algebraic systems. The paper also presents and verifies an accurate timing model to predict the performance of the proposed algorithm on arbitrary clusters of workstations. Through this model the viability of the proposed algorithm can be revealed without the extra effort that would be needed to carry out real testing.