Faster Training by Selecting Samples Using Embeddings (2019)
Long training times have increasingly become a burden for researchers by slowing down the pace of innovation, with some models taking days or weeks to train. In this paper, a new, general technique is presented that aims to speed up the training process by using a thinned-down training dataset. By leveraging autoencoders and the unique properties of embedding spaces, we are able to filter training datasets to only include only the samples that matter the most. Through evaluation on a standard CIFAR-10 image classification task, this technique is shown to be effective. With this technique, training times can be reduced with a minimal loss in accuracy. Conversely, given a fixed training time budget, the technique was shown to improve accuracy by over 50%. This intelligent dataset sampling technique is a practical tool for achieving better results with large datasets and limited computational budgets.
In Proceedings of the 2019 International Joint Conference on Neural Networks, 1-7, Budapest, Hungary, July 2019.

Santiago Gonzalez Ph.D. Alumni slgonzalez [at] utexas edu
Risto Miikkulainen Faculty risto [at] cs utexas edu