Online localization of neuronal sources in the human brain from single trial electroencephalography (EEG) / magnetoencephalography (MEG) has recently been brought into the focus of neuroscience to clarify brain functionality, identify mental states and to improve online applications such as braincomputer interface (BCI) systems. Recursively Applied and Projected Multiple Signal Classification (RAP-MUSIC) is a scalable source localization algorithm, which is predestined to be computed on a many-core processor such as a graphics processing unit (GPU). By utilizing the GPU and NVIDIAs Compute Unified Device Architecture (CUDA), we developed a high performance optimized RAP-MUSIC algorithm. The computational costs are reduced about 50 percent by modifying and precalculating components of the subspace correlation. These modifications together with Powell's Conjugate Gradient Method highly optimize the search process, which now allow the application for online source localizations. We proved the robustness of the localization with the help of simulations, where we analyze the influence of the SNR and different dipole orientations on the calculation time. The presented algorithm provides the ability to obtain up to ten localizations per second with just one GPU (Tesla C2050) on a typically sized lead field matrix.