When a function is not differentiable in the classical sense there are multiple ways to compute a generalized derivative. This post will look at three generalizations of the classical derivative, each applied to the ReLU (rectified linear unit) function. The ReLU function is a commonly used activation function for neural networks. It’s also called the ramp function for obvious reasons. The function is simply r(x) = max(0, x). Pointwise derivative The pointwise derivative would be 0 for x < 0, 1 for x > 0, and undefined at x = 0. So except at 0, the pointwise derivative of the ramp function is the Heaviside function. In a real analysis course, you’d simply say r′(x) =H(x) because functions are only defined up to equivalent modulo sets of measure zero, i.e. the definition at x = 0 doesn’t matter. Distributional derivative In distribution theory you’d identify the function r(x) with the distribution whose action on a test function φ is Then the derivative of r would be the distribution…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.