Week 8 #

Topic: The simple summary of Self-Supervised Learning & Masked Image Model in Self-Supervised Learning Family in Self-Supervised Learning

Keynote Speaker: Shengjie Niu, Zebin Yun

Time: Aug 10, 19:30 - 21:30 pm

Venue: Lecture Hall 3, 302 (SUSTech)

Online Link: TencentMeeting

Compendium #

I. Stage-1: Verstile developments among the field of SSL

InstDisc proposed new pretext task of instance-level discrimination.
CPC employed the predictive pretxt task on SSL, so as to realize wide applicability.
InvaSpread utilize the negative pairs among the batch samples.
CMC first proposed the utilization from the perspective of mutiview.
Stage conclusion in terms of objective functions, pretext tasks and construction of negative pairs.

II. Stage-2: Two distinguished methodologies lead the trend

MoCo proposed momentum encoder and query dictionary for contrastive Self-Supervised Learning.
SimCLR proposed a simple but effective framework to claim the function of augmentation strategies and project head.
MoCo v2 and SimCLR v2
SeLa first compared the instance with clustering prototypes instead of negative pairs, and propose a new component of predict head.
SwAV achieve the promising performance with sway mode of SeLa, leading to the following trend of learning without negative pairs.

III. Stage-3: Learning without negative pairs

BYOL: first Self-Supervised Learning literature finished its training totally without negative pairs.
SimSiam summarized and analysed the function of all kinds of components, resulting in a structure-easy but effective model.

Significant Timepoint: (2020.10) Vision Transfermer(ViT) brought large impacts on computer vision field and promoted the progress of generative vision model based on ViT.

IV. Masked Image Model (MIM) family in Self-Supervised Learning

BEiT trained transformer-based model to reconstruct the input in the form of token-level label (In need of pre-trained tokenizer)
MAE trained transformer-based model to reconstercut the input in pixel level
SimMIM summarized some MIM models and found a series of simple but effective components.

Material #

Slides & Source Code for Week-8 seminar from Shengjie Niu.

Big Picture from Shengjie Niu.

Reference #

Z. Wu et al, Unsupervised Feature Learning via Non-Parametric Instance Discrimination.
A. Oord et al, Representation Learning with Contrastive Predictive Coding.
M. Ye et al, Unsupervised Embedding Learning via Invariant and Spreading Instance Feature.
Y. Tian et al, Contrastive Multiview Coding.
K. He et al, Momentum Contrast for Unsupervised Visual Representation Learning.
T. Chen et al, A Simple Framework for Contrastive Learning of Visual Representations.
X. Chen et al, Improved Baselines with Momentum Contrastive Learning.
T. Chen et al, Big Self-Supervised Models are Strong Semi-Supervised Learners.
Y. Asano et al, Self-labelling via simultaneous clustering and representation learning.
M. Caron et al, Unsupervised Learning of Visual Features by Contrasting Cluster Assignments.
J. Grill et al, Bootstrap your own latent: A new approach to self-supervised Learning.
X. Chen et al, Exploring Simple Siamese Representation Learning.
M. Caron et al, Emerging Properties in Self-Supervised Vision Transformers.
B. Hao et al, BEiT: BERT Pre-Training of Image Transformers.
K He et al, Masked Autoencoders Are Scalable Vision Learners.
Z Xie et al, SimMIM: A Simple Framework for Masked Image Modeling.