Representation Learning

Learning meaningful, low-dimensional representations of data is a challenging problem. Particularly for an autonomously learning system, representations learned from observations can play a crucial role. Consider for example a system that receives many images of faces and is capable of finding out that there are common factors explaining most of the visible characteristics, such as gender, or hair color. Variational Autoencoders (VAEs) can do this to an astonishing extent, but it was unclear why VAEs actually have this ability. We found that the reason is a byproduct of simplifying the learning objective to make the method tractable and suitable for applications []. Interestingly, this insight allowed us to connect VAEs to the classical method of principle component analysis. Furthermore, we then used this understanding to demonstrate that VAEs solely rely on the consistency of local structures in the datasets. In particular, we show that adding small elaborate perturbations to existing datasets prevent the VAE on picking up such convenient structures, yielding new insights into which types of inductive biases and weak supervisions can reliably improve the quality of learned representations []. In the future, we hope to utilize this knowledge for further advances in general data analysis.

Members

Empirical Inference, Autonomous Learning

Georg Martius

Senior Research Scientist

Autonomous Learning

Michal Rolinek

Empirical Inference

Dominik Zietlow

Guest Scientist

Publications

Autonomous Learning Conference Paper Demystifying Inductive Biases for (Beta-)VAE Based Architectures Zietlow, D., Rolinek, M., Martius, G. In Proceedings of the 2021 International Conference on Machine Learning (ICML), 139:12945-12954, Proceedings of Machine Learning Research , The 38th International Conference on Machine Learning (ICML 2021), July 2021 (Published)

Abstract ›

The performance of Beta-Variational-Autoencoders and their variants on learning semantically meaningful, disentangled representations is unparalleled. On the other hand, there are theoretical arguments suggesting the impossibility of unsupervised disentanglement. In this work, we shed light on the inductive bias responsible for the success of VAE-based architectures. We show that in classical datasets the structure of variance, induced by the generating factors, is conveniently aligned with the latent directions fostered by the VAE objective. This builds the pivotal bias on which the disentangling abilities of VAEs rely. By small, elaborate perturbations of existing datasets, we hide the convenient correlation structure that is easily exploited by a variety of architectures. To demonstrate this, we construct modified versions of standard datasets in which (i) the generative factors are perfectly preserved; (ii) each image undergoes a mild transformation causing a small change of variance; (iii) the leading VAE-based disentanglement architectures fail to produce disentangled representations whilst the performance of a non-variational method remains unchanged.

Arxiv PDF Paper @ ICML 2021 (spotlight video) URL BibTeX

Autonomous Learning Conference Paper Variational Autoencoders Pursue PCA Directions (by Accident) Rolinek, M., Zietlow, D., Martius, G. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12406-12415, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

Abstract ›

The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants show unparalleled performance. However, the reasons for this are unclear, since a very particular alignment of the latent embedding is needed but the design of the VAE does not encourage it in any explicit way. We address this matter and offer the following explanation: the diagonal approximation in the encoder together with the inherent stochasticity force local orthogonality of the decoder. The local behavior of promoting both reconstruction and orthogonality matches closely how the PCA embedding is chosen. Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.

Arxiv URL BibTeX