Normalization (归一化)

Normalization (归一化)
ExisfarNormalization (归一化)
Why
In typical neural networks, the activations of each layer can vary drastically, leading to issues like exploding or vanishing gradients, which slow down training.
- 主要目标:稳定训练过程,解决内部协变量偏移(Internal Covariate Shift),加速收敛。
- 次要影响:可能间接提升泛化(非主要目的)。
Normalization is often used to:
- increase the speed of training convergence,
- reduce sensitivity to variations and feature scales in input data,
- reduce overfitting,
- and produce better model generalization to unseen data.
What
对数据分布的操作:归一化层输入或输出的分布(均值为0,方差为1)
Example
- Batch Normalization: normalize over the batch dimension.
- Layer Normalizatioan: normalize over the features for each individual data point.
- Others: InstanceNorm, GroupNorm, …
不同归一化方法对比
方法 | 作用对象 | 适用场景 | 对训练稳定的贡献机制 |
---|---|---|---|
Batch Norm | 每层输入的每个特征通道 | CNN等固定维度网络 | 解决协变量偏移,允许大学习率 |
Layer Norm | 单样本的所有特征 | RNN/Transformer | 处理变长序列,稳定梯度 |
Weight Norm | 权重向量 | 强化学习、GAN | 直接约束参数尺度 |
Group Norm | 分组后的特征通道 | 小批量场景(如目标检测) | 替代Batch Norm解决小批次问题 |
How
Batch Normalization
The normalization process involves calculating the mean and variance of each feature in a mini-batch and then scaling and shifting the features using these statistics.
- Compute the Mean and Variance of Mini-Batches: For mini-batch of activations , the mean and the variance of the mini-batch are computed.
- Normalization: Each activation is normalized using the computed mean and variance of the mini-batch.
- Scale and Shift the Normalized Activations: and are learnable parameters, which allow the optimal scaling and shifting of the normalized activations, giving the network additional flexibility.
Layer Normalization
The normalization process involves calculating the mean and variance for each faeture in a sample and then scaling and shifting the features using these statistics.
- Compute the Mean and Variance for Each Feature: The mean and variance are calculated for each input. ( is the number of features (neurons) in the layer)
- Normalize the Input:
- Apply Scaling and Shifting:
注意事项
- 并非万能:
• 归一化可能损害某些任务的性能(如生成模型中对输出分布的约束)。
• 推理阶段需固定统计量(如Batch Norm使用全局均值和方差)。 - 与初始化协同:
归一化常与Xavier/Kaiming初始化配合使用,共同控制信号传播尺度。
Reference
https://en.wikipedia.org/wiki/Normalization_(machine_learning)
https://www.geeksforgeeks.org/what-is-batch-normalization-in-deep-learning/
https://www.geeksforgeeks.org/what-is-layer-normalization/