Universal Approximation in Deep Learning

In here, here, here, and here we have talked about function approximation and curve fitting. Here we found something interesting in [1] that seems to be a conclusion to our discussion.

If we consider function approximation as the goal of the neural network, then it has been proved that a single hidden layer neural network can approximate each function.

A more detailed discussion can be found in [1]. The functions which may need to be approximate should be Borel measurable. It seems based on the definition of Borel measurable functions, almost all applicable functions are Borel measurable. Hence Borel measurability is not a concern. But Goodfellow et. al., discuss three important factors:

Using a deep network rather than a single layer causes a fewer number of parameters. Therefore, more preferable.
There is no guarantee that a training method can achieve the best results based on the desired cost function.
No free lunch theorem: there is not a unique best approximation on some training data while in testing data may be some unobserved samples.

References:

[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

Universal Approximation in Deep Learning

Be First to Comment

Leave a Reply Cancel reply