Abstract:
There is tremendous growth of data generated from different industries, i.e., health, agriculture, engineering, etc. Consequently, there is demand for more processing power. Compared to computer processing units, general-purpose graphics processing units (GPUs) are rapidly emerging as a promising solution to achieving high performance and energy efficiency in various computing domains. Multiple forms of parallelism and complexity in memory access have posed a challenge in developing Random Forest (RF) GPU-based algorithm. RF is a popular and robust machine learning algorithm. In this paper, coarse-grained and dynamic parallelism approaches on GPU are integrated into RF(dpRFGPU). Experiment results of dpRFGPU are compared with sequential execution of RF(seqRFCPU) and parallelised RF trees on GPU(parRFGPU). Results show an improved average speedup from 1.62 to 3.57 of parRFGPU and dpRFGPU, respectively. Acceleration is also evident when RF is configured with an average of 32 number of trees and above in both dpRFGPU and parRFGPU on low-dimensional datasets. Nonetheless, larger datasets save significant time compared to smaller datasets on GPU (dpRFGPU saves more time compared to parRFGPU). dpRFGPU approach significantly accelerated RF trees on GPU. This approach significantly optimized RF trees parallelization on GPU by reducing its training time.