Preprocess the data

Preprocess the data 2020-09-17 GK

0 条评论

Normalization: such as, avoid different unit

Works okay for small networks, but can lead to non-homogeneous distributions of activations across the layers of a network.

Consider a batch of activations at some layer. To make each dimension unit Gaussian, apply:

$$\widehat{x}^{(k)}=\frac{x^{(k)}-\mathrm{E}\left[x^{(k)}\right]}{\sqrt{\operatorname{Var}\left[x^{(k)}\right]}}$$

GK的终末世界