Journal:Informatica
Volume 17, Issue 4 (2006), pp. 535–550
Abstract
Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms for transposing small and large matrices using the popular symmetric multiprocessors (SMP) architecture, which carries a relatively low communication cost due to its large aggregate bandwidth and low-latency inter-process communication. We conduct analysis on the cost of data sending / receiving and the memory requirement of these matrix-transpose algorithms. We then propose an adaptive algorithm that can minimize the overhead of the matrix transpose operations given the parameters such as the data size, number of processors, start-up time, and the effective communication bandwidth.
Journal:Informatica
Volume 17, Issue 3 (2006), pp. 309–324
Abstract
Three parallel algorithms for solving the 3D problem with nonlocal boundary condition are considered. The forward and backward Euler finite-difference schemes, and LOD scheme are typical representatives of three general classes of parallel algorithms used to solve multidimensional parabolic initial-boundary value problems. All algorithms are modified to take into account additional nonlocal boundary condition. The algorithms are implemented using the parallel array object tool ParSol, then a parallel algorithm follows semi-automatically from the serial one. Results of computational experiments are presented and the accuracy and efficiency of the presented parallel algorithms are tested.
Journal:Informatica
Volume 16, Issue 3 (2005), pp. 317–332
Abstract
The conjugate gradient method is an iterative technique used to solve systems of linear equations. The paper analyzes the performance of parallel preconditioned conjugate gradient algorithms. First, a theoretical model is proposed for estimation of the complexity of PPCG method and a scalability analysis is done for three different data decomposition cases. Computational experiments are done on IBM SP4 computer and some results are presented. It is shown that theoretical predictions agree well with computational results.
Journal:Informatica
Volume 7, Issue 3 (1996), pp. 295–310
Abstract
In this paper we consider the problem of solving 3D diffusion problems on distributed memory computers. We present a parallel algorithm that is suitable for the number of processors less or equal 8. The pipelining method is used to enlarge the number of processors till 64. The computational grid decomposition method is proposed for heterogenous clusters of workstations which preserves the load balancing of computers. The numerical results for two clusters of workstations are given.
Journal:Informatica
Volume 7, Issue 3 (1996), pp. 281–294
Abstract
This paper deals with load balancing of parallel algorithms for distributed-memory computers. The parallel versions of BLAS subroutines for matrix-vector product and LU factorization are considered. Two task partitioning algorithms are investigated and speed-ups are calculated. The cases of homogeneous and heterogeneous collections of computers/processors are studied, and special partitioning algorithms for heterogeneous workstation clusters are presented.