Regularization (正则化)

Regularization (正则化)

Why

  • 主要目标:防止过拟合,通过显式或隐式约束模型复杂度,提升泛化性能。
  • 次要目标:可能加速训练(如L2稳定优化)。

In mathematics, statistics, finance,[1] and computer science, particularly in machine learning and inverse problems, regularization is a process that converts the answer of a problem to a simpler one. It is often used in solving ill-posed problems or to prevent overfitting.[2]

What

Explicit Regularization

直接修改损失函数或模型结构。

Example

  • L1/L2正则化:在损失函数中添加参数惩罚项。
  • Elastic net regularization: combines L1 and L2 panalties.
  • Dropout:在训练过程中随机将一部分神经元(及其连接)暂时“丢弃”(输出置零),每次迭代时随机选择不同的神经元子集。

Implicit Regularization

通过训练策略间接约束模型。

Example

  • Early Stop
  • 数据增强
  • 权重衰减

How

L1 and L2 Regularization

  • L1 regularization (also called LASSO) leads to sparse models by adding a penalty based on the absolute value of coefficients.

Cost=1ni=1n(yiyi^)2+λi=1mwiCost = \frac{1}{n}\sum_{i=1}^n(y_i-\hat{y_i})^2 + \lambda\sum_{i=1}^m |w_i|

  • L2 regularization (also called ridge regression) encourages smaller, more evenly distributed weights by adding a penalty based on the square of the coefficients.[4]

Cost=1ni=1n(yiyi^)2+λi=1mwi2Cost = \frac{1}{n}\sum_{i=1}^n(y_i-\hat{y_i})^2 + \lambda\sum_{i=1}^m w_i^2

Elastic net regularization

Elastic Net Regression is a combination of both L1 as well as L2 regularization.

Cost=1ni=1n(yiyi^)2+λ((1α)i=1mwi+αi=1mwi2)Cost = \frac{1}{n}\sum_{i=1}^n(y_i-\hat{y_i})^2 + \lambda((1-\alpha)\sum_{i=1}^m |w_i| + \alpha \sum_{i=1}^m w_i^2)

Dropout

Dropout (also called DropConnect) is a regularization techniques for reducing overfitting in neural networks by preventing complex co-adaptations on training data. It’s an efficient way of performing model averaging with neural networks.[3] It randomly “dropping out”, or omitting, units (both hidden and visible) during the training process of a neural network.

w^j={wj,with P(c)0,otherwise\hat{w}_j = \begin{cases} w_j, & \text{with } P(c) \\ 0, & \text{otherwise} \end{cases}

The Difference from Normalization

  • Regularization​​像“交通规则”:强制模型遵守简单规则(如限速),避免危险行为(过拟合)。
  • ​​Normalization​像“车辆保养”:确保发动机(梯度)运行平稳,但不管驾驶员是否超速(过拟合)。

调参优先级:​​先确保训练稳定(Normalization),再控制过拟合(Regularization)

Reference

https://en.wikipedia.org/wiki/Regularization_(mathematics)
https://www.geeksforgeeks.org/regularization-in-machine-learning/
https://en.wikipedia.org/wiki/Dilution_(neural_networks)
[1] Kratsios, Anastasis (2020). “Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization Data”. Risks. 8 (2): [1]. doi:10.3390/risks8020040. hdl:20.500.11850/456375. Term structure models can be regularized to remove arbitrage opportunities [sic?].
[2] Bühlmann, Peter; Van De Geer, Sara (2011). Statistics for High-Dimensional Data. Springer Series in Statistics. p. 9. doi:10.1007/978-3-642-20192-9. ISBN 978-3-642-20191-2. If p > n, the ordinary least squares estimator is not unique and will heavily overfit the data. Thus, a form of complexity regularization will be necessary.
[3] Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). “Improving neural networks by preventing co-adaptation of feature detectors”. arXiv:1207.0580 [cs.NE].