Journal:Informatica
Volume 16, Issue 3 (2005), pp. 317–332
Abstract
The conjugate gradient method is an iterative technique used to solve systems of linear equations. The paper analyzes the performance of parallel preconditioned conjugate gradient algorithms. First, a theoretical model is proposed for estimation of the complexity of PPCG method and a scalability analysis is done for three different data decomposition cases. Computational experiments are done on IBM SP4 computer and some results are presented. It is shown that theoretical predictions agree well with computational results.
Journal:Informatica
Volume 14, Issue 2 (2003), pp. 167–180
Abstract
This work describes a realistic performance prediction tool for the parallel block LU factorization algorithm. It takes into account the computational workload, communication costs and the overlapping of communications by useful computations. Estimation of the tool parameters and benchmarking are also discussed. Using this tool we develop a simple heuristic for scheduling LU factorization tasks. Results of numerical experiments are presented.
Journal:Informatica
Volume 9, Issue 4 (1998), pp. 437–448
Abstract
This paper presents a parallel version of a Generalized Conjugate Gradient algorithm proposed by Liu and Story in which the search direction considers the effect of the inexact line search. We describe the implementation of this algorithm on a parallel architecture and analyze the related speedup ratios. Numerical results are given for a shared memory computer (Cray C92).
Journal:Informatica
Volume 7, Issue 3 (1996), pp. 295–310
Abstract
In this paper we consider the problem of solving 3D diffusion problems on distributed memory computers. We present a parallel algorithm that is suitable for the number of processors less or equal 8. The pipelining method is used to enlarge the number of processors till 64. The computational grid decomposition method is proposed for heterogenous clusters of workstations which preserves the load balancing of computers. The numerical results for two clusters of workstations are given.
Journal:Informatica
Volume 7, Issue 3 (1996), pp. 281–294
Abstract
This paper deals with load balancing of parallel algorithms for distributed-memory computers. The parallel versions of BLAS subroutines for matrix-vector product and LU factorization are considered. Two task partitioning algorithms are investigated and speed-ups are calculated. The cases of homogeneous and heterogeneous collections of computers/processors are studied, and special partitioning algorithms for heterogeneous workstation clusters are presented.