We show that using dropout in a network can be interpreted as a kind of data augmentation without domain knowledge. We present an approach to projecting the dropout noise within a network back into the input space, thereby generating augmented versions of the training data, and we show that training a deterministic network on the augmented samples yields similar results. We furthermore propose an explanation for the increased sparsity levels that can be observed in networks trained with dropout and show how this is related to data augmentation. Finally, we detail a random dropout noise scheme based on our observations and show that it improves dropout results without adding significant computational cost.
Added 4 years ago by Richard Zemel