N1H111SM's Miniverse

2020/06/22 Share

Materials

# Methods

BYOL的目标是学习到$f_\theta$将原有的input映射到representation space.
Online network 由参数 $\theta$ model的，共有三个阶段：首先从原有的input image选取一个view (由具体的data augmentation方案决定)；接着分别通过encoder $f_\theta$, projector $g_\theta$ and a predictor $q_\theta$.
Target network 的架构和online network相同，参数 $\xi$ 是exponential moving average of the online parameters: given a target decay rate $\tau \in[0,1]$, we update $\xi$ as follows

# Experiments

## Ablations

More robust for smaller batch size and less image augmentation (compared to CL).

# Conclusion

Nevertheless, BYOL remains dependent on existing sets of augmentations that are specific to vision applications. To generalize BYOL to other modalities (e.g., audio, video, text, . . . ) it is necessary to obtain similarly suitable augmentations for each of them. Designing such augmentations may require significant effort and expertise. Therefore, automating the search for these augmentations would be an important next step to generalize BYOL to other modalities.