Regularization (正则化)

Regularization (正则化)
ExisfarRegularization (正则化)
Why
- 主要目标:防止过拟合,通过显式或隐式约束模型复杂度,提升泛化性能。
- 次要目标:可能加速训练(如L2稳定优化)。
In mathematics, statistics, finance,[1] and computer science, particularly in machine learning and inverse problems, regularization is a process that converts the answer of a problem to a simpler one. It is often used in solving ill-posed problems or to prevent overfitting.[2]
What
Explicit Regularization
直接修改损失函数或模型结构。
Example
- L1/L2正则化:在损失函数中添加参数惩罚项。
- Elastic net regularization: combines L1 and L2 panalties.
- Dropout:在训练过程中随机将一部分神经元(及其连接)暂时“丢弃”(输出置零),每次迭代时随机选择不同的神经元子集。
Implicit Regularization
通过训练策略间接约束模型。
Example
- Early Stop
- 数据增强
- 权重衰减
How
L1 and L2 Regularization
- L1 regularization (also called LASSO) leads to sparse models by adding a penalty based on the absolute value of coefficients.
- L2 regularization (also called ridge regression) encourages smaller, more evenly distributed weights by adding a penalty based on the square of the coefficients.[4]
Elastic net regularization
Elastic Net Regression is a combination of both L1 as well as L2 regularization.
Dropout
Dropout (also called DropConnect) is a regularization techniques for reducing overfitting in neural networks by preventing complex co-adaptations on training data. It’s an efficient way of performing model averaging with neural networks.[3] It randomly “dropping out”, or omitting, units (both hidden and visible) during the training process of a neural network.
The Difference from Normalization
- Regularization像“交通规则”:强制模型遵守简单规则(如限速),避免危险行为(过拟合)。
- Normalization像“车辆保养”:确保发动机(梯度)运行平稳,但不管驾驶员是否超速(过拟合)。
调参优先级:先确保训练稳定(Normalization),再控制过拟合(Regularization)
Reference
https://en.wikipedia.org/wiki/Regularization_(mathematics)
https://www.geeksforgeeks.org/regularization-in-machine-learning/
https://en.wikipedia.org/wiki/Dilution_(neural_networks)
[1] Kratsios, Anastasis (2020). “Deep Arbitrage-Free Learning in a Generalized HJM Framework via Arbitrage-Regularization Data”. Risks. 8 (2): [1]. doi:10.3390/risks8020040. hdl:20.500.11850/456375. Term structure models can be regularized to remove arbitrage opportunities [sic?].
[2] Bühlmann, Peter; Van De Geer, Sara (2011). Statistics for High-Dimensional Data. Springer Series in Statistics. p. 9. doi:10.1007/978-3-642-20192-9. ISBN 978-3-642-20191-2. If p > n, the ordinary least squares estimator is not unique and will heavily overfit the data. Thus, a form of complexity regularization will be necessary.
[3] Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). “Improving neural networks by preventing co-adaptation of feature detectors”. arXiv:1207.0580 [cs.NE].