124 days ago · Tech · 0 comments

GPUs power modern deep learning models because these models rely on tensor operations, which can be efficiently parallelized on GPUs with their thousands of cores. However, apart from tensor computations, these models also rely on random numbers. For example, to initialize the model weights, during dropout, data sampling, stochastic gradient descent, etc.So, the question arises: how do frameworks like PyTorch generate random numbers in parallel on GPU devices? Because if random number generation becomes a bottleneck, it can significantly slow down the entire training or inference pipeline.The answer lies in a clever algorithm called Philox, a counter-based parallel random number generator. In this article, we’ll explore:Why traditional random number generators don’t parallelize wellHow Philox works and what makes it differentHow to parallelize random number generation using PhiloxPyTorch’s implementation of Philox by dissecting its C++ and CUDA codeBy the end, you’ll understand how…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.