DL Normalization

date

Jul 18, 2022

slug

DL1

author

status

Public

1. Batch Normalization (BN)[2]

BN first normalizes the distribution X (Use the x in the formula above) into normal distribution, subsequently, using two training parameters to let NN choose the favorite distribution. Loffe and Szegedy prove that BN can significantly ease the training process. Although it was designed to accelerate the training of discriminative NN, it had been found effective in the generative model in later years.

2. Instance Normalization (IN)

IN is similar to BN. Specifically, the only difference is that IN calculates the mean and variance by independent sample, however, BN uses batch. Although the author shows that IN can achieve significant improvement, it only has hundreds of citations.

3. Conditional Instance Normalization (CIN)

Although CIN has only 700 citations, the idea of it is interesting to know. CIN can be considered as parallel to INs, in other words, CIN has several independents IN. Surprisingly, these IN perform differently even though they receive equal input. Moreover, we can view these different INs as styles, in the other words, they are the same thing but show up differently.

4. Adaptive Instance Normalization (AdaIN)

AdaIN contains no trainable paramater, in fact AdaIN is trying to adjust the one distribution to another. Specifically, AdaIN is trying to make the automatic switch between different styles. Although it consists of no training parameter, different styles are needed. The styles are usually provided by another NN, hence, it needs more computation than CIN.

Summary

In my point of view, BN views all the data in batch as having similar styles, hence, normalizing the data can speed up the training process. IN shows the data should be viewed separately. CIN agrees with IN's perspective and shows that single data can have different styles in a different view. AdaIN focuses on the application of changing the styles automatically. Unfortunately, none of the methods are solving covariate shift. Hence, data shift is still the weakness of deep learning.

Shimodaira, Hidetoshi. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, October 2000.

Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.

Huang, Xun, and Serge Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proceedings of the IEEE international conference on computer vision. 2017.