Layerwise normalization…
After computing the layer output, W_t * h_t, get vector x_t
Compute the mean and variance of it then
Where g and b are 2 trainable vectors