Complexity analysis of scheduling algorithms for parallel computing
industrial collaborators: ARM Ltd
academic collaborators: University of Nottingham
initiated : 2009/10/03
last updated: 2010/04/27

selected page:

The problem

OpenCL is an open framework for parallel programming in heterogenous systems. It was developed by AMD, Apple, ARM, Intel, Motorola, Nokia, nVidia, Oracle, Qualcomm, Samsung, Sony, Texas Instruments, and a number of others companies. The specification envisions the bulk of demanding computations being handled by a multi-core accelerator, instead of the (single) central processing unit. The work load comes from a number of applications, each with a given priority. Each application utilises a number of OpenCL-accelerated functions, or kernels. We denote a single execution of a single kernel as a job. Each application also has a number of interfaces to the accelerator, or queues. When a job is to be run, it needs to be submitted to a queue, together with a specification of pre-requisite jobs (i.e. jobs that need to finish before the submitted job can start). All jobs in a queue are passed to the accelerator concurrently when the queue is submitted to the accelerator. The accelerator can schedule the execution of submitted jobs arbitrarily, except for the constraints imposed by pre-requisite jobs. Processing times of jobs are known precisely only upon completion. Details of future queue submissions are unknown. It is desirable to sequence the execution of jobs in the accelerator to maximise throughput, without incurring much overhead in the scheduler.

Modern wireless standards, both in wireless LAN (IEEE 802.11n, 802.16) and cellular networks (LTE, LTE Advanced), use multiple transmit and multiple receive antennas operating concurrently in the same frequency band (MIMO). Due to advances in channel estimation, it is possible to detect the input signal from what is received by multiple receive antennas, yielding considerable gains in channel capacity compared with a single pair of antennas. We have focused on methods for decoding quadrature amplitude modulation (QAM) symbols using a given channel estimate.

The approach

In scheduling jobs in OpenCL accelerators, we have first modelled the problem as stochastic scheduling with pre-requisites. We established the complexity of one model of the problem, building upon the work of Papadimitriou and Tsitsiklis. Building upon related work of Papadimitriou, Tsitsiklis, and Humblet on the stability of certain networks of queues, we have suggested a family of scheduling policies. Finally, we have designed fully-dynamic data structures for maintaining the details of data dependencies among jobs and answering the queries required by the policies.

In MIMO decoding, we have proposed that a least squares relaxation of integer least squares can be viewed as an instance of second-order cone programming. We then iteratively strengthen the solution by the addition and resolution of violated inequalities. The run-time of the interior point method is dominated the complexity of a Cholesky decomposition of an O(m) x O(n) matrix, where m and n are the numbers of transmitting and receiving antennas, respectively. This extends the notion of real-time convex programming to real-time integer convex programming. We believe this will lead to the development of low-power integrated circuits with near-Maximum Likelihood performance.


related resources:
  Complexity analysis of scheduling algorithms for parallel computing
» Technical details
  Download Case Study as PDF
 
other projects:
[Find other Information and Communication Technology projects]