Using a Graphical Processing Unit in oilfield reservoir simulation
industrial collaborators: Roxar
academic collaborators: Bournemouth University
initiated : 2009/09/07
last updated: 2010/01/05

selected page:


The problem

This project is to evaluate the use of GPUs to accelerate the calculation of flow in an oilfield reservoir. The flow is calculated as the solution of a large set of non-linear equations. This is reduced to solving a system of sparse linear equations using Newton-Raphson iteration.

Newton-Raphson iteration and solving sparse linear equations is a standard numerical technique used in many fields. In the domain of oilfield reservoir simulation the bulk of computational time is spent solving the linear equations.

The aim of the project is to accelerate solving sparse linear systems of equations using GPUs. A key component is sparse matrix vector multiplication. This is limited by memory bandwidth; a typical high-end CPU chip has a memory bandwidth of about 30GB/s. A typical high-end GPU has a memory bandwidth of about 100GB/s. We would expect a significant improvement in performance running on the GPU as compared to a CPU. Indications suggest future GPUs will expand memory bandwidth faster than CPUs.


The approach

In this project, we have implemented and evaluated sparse matrix representations. In the case of coupled linear equations we also tested block and strip implementations of the sparse matrix. The basic representation was then used in a block tri diagonal solver on the GPU. This required taking an existing C++ algorithm implemented on the CPU and translating to the GPU programming language CUDA.

Our approach was to benchmark an existing MPI parallel C++ application using both single core and all 8 cores of a high-end CPU based system. These results were then compared with results using the latest NVidia GPUs.

Nehalem Xeon X5560
2.80 GHz x 2 24 GB RAM
GPU GTX 295
(using single GPU)
1 Core 14.7 ms 5.94 ms
8 Cores 4.4 ms

Table 1: CSR Matrix Vector Multiplication.

Nehalem Xeon X5560
2.80 GHz x 2 24 GB RAM
GPU GTX 295
(using single GPU)
1 Core 14.7 ms 1.12 ms
8 Cores 4.4 ms

Table 2: Diagonal Matrix Representation (DIA) Matrix Vector Multiplication.

The initial results were disappointing, see Table 1. Using all cores of a 2 chip CPU system was similar to the GPU. When the data representation was changed the GPU was able to use its maximum memory bandwidth and was significantly faster, see Table 2.

A benchmark result with Diagonal Matrix Representation (DIA) has shown that the GPU is 4 times faster than the latest multi-core CPU system.

The results have shown that a single GPU can run significantly faster than multiple high-end CPUs. However, attaining good performance is highly dependent on the particulars of the memory layout of data. Furthermore, GPU performance is much more sensitive than the CPU to the implementation of the algorithm.

The internship has enabled Roxar to evaluate the value and cost of development to support GPUs in our applications. The work has given a better understanding of the link between numerical mathematics and how it is mapped to computer hardware. Finally, the intern Ehtzaz has been exposed to problems outside of his academic field and the issues in developing commercial products.


related resources:
  Using a Graphical Processing Unit in oilfield reservoir simulation
» Technical Summary
 
other projects:
[Find other Energy and utilities projects]