N1H111SM's Miniverse

2020/06/10 Share

Materials

Method Description

Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts: a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image.

Two stages of training. First, the shared representation is learned via cross mutual information estimation and maximization. Secondly, mutual information maximization is performed to learn the exclusive representation while minimizing the mutual information between the shared and exclusive representations.

Experiments

Question

Dear Eduardo Hugo Sanchez,

After reading your wonderful paper “Learning Disentangled Representations via Mutual Information Estimation”, I have one question regarding the setup of your training procedure, which is:

How do you decide between which two images the MI is maximized during training? Say in Colorful MNIST, if you put the images of the same digit together and follow your obejective, then can I conclude that you’re actually telling the model to learn a (linear, in some cases) seperable representation regarding the digit classification? Below is how I come to this conclusion:
If we are maximizing the MI between all the images containing the same digit, then during the process of MI maximization, we will shuffle the whole batch of the data in order to form the negative samples, which will be feed into the critic function. Since the critic function has to distinguish the samples like (X,Y)=(black7, red7) and the shuffled samples like (X,Y’)=(black7, yellow10), we are explicitly telling the model to classify the digits.
If we train the whole network in a totally unsupervised way, i.e., training sample pairs like (X,Y)=(black7, yellow10) are randomly showing up asking the model to learn the shared information between them, how the method is able to learn the disentangled representations is really confusing me…

Furthermore, did you do the ablation study of the “cross mutual information maximization” technique? How did it go?