COMPUTER JOURNAL, vol.48, no.3, pp.300-314, 2005 (SCI-Expanded)
A heterogeneous computing (HC) system is composed of a suite of geographically distributed high-performance machines interconnected by a high-speed network, thereby providing high-speed execution of computationally intensive applications with diverse demands. In HC systems, however, there is a possibility of machine and network failures and this can have an adverse impact on applications running on the system. In order to decrease the impact of failures on an application, matching and scheduling algorithms must be devised which minimize not only the execution time but also the failure probability of the application. However, because of the conflicting requirements, it is not possible to minimize both at the same time. Thus, the goal of this paper is to develop matching and scheduling algorithms which account for both the execution time and the failure probability and can trade off execution time against the failure probability of the application. In order to attain these goals, a biobjective scheduling problem is first formulated and then two different algorithms, the biobjective dynamic level scheduling algorithm and the biobjective genetic algorithm, are developed. Unique to both algorithms is the expression used for computing the failure probability of an application with precedence constraints. The simulation results confirm that the proposed algorithms can be used for producing task assignments where the execution time is weighed against the failure probability.