A CUDA Library for Data Mining Algorithms

Team

Lakshitha Madushan
Pahan Madusha
Darshi Dineshika
Hasindu Gamaarachchi

Supervisors

Prof. Roshan G. Ragel

Description

General Purpose Computing on graphics processing units (GPGPU) has enabled inexpensive high performance computing for general-purpose applications. Compute Unified Device Architecture (CUDA) is a programming model which provides a platform to exploit parallel computing power of GPU using C/C++ languages. CUDA Dynamic Parallelism (CDP) is a feature introduced in CUDA Kepler architecture, which enables a CUDA kernel to create and synchronize new nested work. Huge computational time for data mining processes is a significant challenge met by data scientists. One solution for the above is the use of massive parallelism enabled by GPUs in data mining processes. This is a project done to provide a CUDA library for programmers to efficiently run data mining tasks with the use of NVIDIA GPUs. The library provides three popular data mining algorithms, namely Apriori, K-means clustering and Random Forest classification. The library not only accelerates the data mining process, but also provides an API for developers which can be used easily without much knowledge in CUDA. The researchers also have attempted to improve performance of this library by using better data structures and CUDA Dynamic Parallelism (CDP). The researchers main focus has been on building a CUDA library for developers and improving the performance of Apriori algorithm using data structures suitable for GPU computations and CUDA Dynamic Parallelism (CDP). The experimental results of this research show how CPU and GPU performance differ and how using data structures suitable for GPU computations and CUDA Dynamic Parallelism (CDP) give better performance for the algorithms in the library. Keywords – Graphics Processing Unit(GPU), CUDA, Data mining, Apriori algorithm, Random Forest algorithm, K-means algorithm, Dynamic Parallelism

Tags: GPU Computing