He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian :: Deep Residual Learning for Image Recognition
Table of Contents
Residual refer to the part of function other than the original input. If the mapping is \(H(x)\), where \(x\) is a vector of inputs, \(H(x)\) is a vector with same dimension, \(H(x)-x\) is the residual function, the part of the origianl mapping \(H(x)\) cutting input \(x\) out.
For example, residual function of \(y = 5x\) is \(y = 4x = 5x - x\).
1. mechnism
essentially, add an identical mapping shortcut every few layers in a deep network, to learn the same mapping.
If the best way is identity mapping, you get it from the shortcut. If it is not, just ignore it.
Added information for fitting.
2. why it is important
2.1. phenomenon : deep network works sometimes worse than shallower network
2.2. theorem : this should not be
we can have a deep network work(say 100 layers) identical as a shallower network(say 10 layers), by stacking 90 layers of identity mapping after the shallower network.
The 10+90 network should work exactly the same as the 10 network.
2.3. hypothese : maybe, the non-linear operations/mappings made identical mapping hard
it is fair, as at least relu drop parts of input, making identity mapping infeasible.
2.4. solved problem: avoided non-linear mapping - identity mapping difficulty
3. notes
- what is relu? it is used in F(x) + x, so I assume some kind of sum function. I think it’s of crucial importance to the problem.a
- what does non-linear layer mean? I think it probably refer to way of provide value to activation function other than a linear combinition of wx. like convolution, where a kernel filters stuff, which in my opinion may be still kind of linear, but is definitely have information loss.
Bibliography
Backlinks
(example)
in He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian :: Deep Residual Learning for Image Recognition, the presence and usage of identical shortcut
is the tension of the whole discussion. how deep learning, relu, non-linear transformation works are foundation knowledge.