Free and Open Source Software for Geomatics Conference FOSS4G 2010 Barcelona

Selected Presentations

Home > Presentations > Abstract details




Nowadays most computers are equipped with multicore CPUs, which provide performance increase with manageable power consumption, and since the release of CUDA (Compute Unified Device Architecture) by NVIDIA, there has been huge interest in how to use GPU for scientific computing. Unfortunately, unlike the clock cycle competition in single core CPU age, application software like GRASS GIS has to redesign its algorithms to be able to realize the expected performance gain. In this work, we aim to tune the speed of GRASS GIS for multicore and Graphics Processing Unit (GPU) architectures. The speed-up by utilizing multicore and GPU architectures will greatly improve the productivity of GRASS GIS users.

We start with principal component analysis (PCA). PCA is an image transforming algorithm, and the current i.pca implementation has very low performance in the sense of execution time because of its lack of attention to the principals and practice of high performance computing. 


Our preliminary work reveals that three of the major components of i.pca, namely covariance matrix generation, eigen-decomposition and reprojection with the eigenvectors, share the character of high ratio of computation/communication, and are perfect for achieving close-to-peak performance on modern CPU and GPU. In experiment, by carefully redesigning the algorithm in i.pca, the time of covariance matrix generation has achieved a speed-up of more than 100x on a quad-core intel CPU through formulating the process as rank-n updates. Similar technique can be applied to the eigenvector reprojecting, and eigen-decomposition has already have many good implementations for high performance. Furthermore, since GPU has even more parallelism capability than multicore CPU, the rank-n update will be implemented using CUDA C and turned into a 3D kernel, and combined with the rest two parts also working on GPU, a GPU based i.pca implementation will be presented and compared with the multicore CPU version.


Peng Du - University of Tennessee Knoxville
Shih-Lung Shaw - University of Tennessee Knoxville

Full Paper

Download Full Paper