He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian :: Deep Residual Learning for Image Recognition

Table of Contents

[1]

Residual refer to the part of function other than the original input. If the mapping is \(H(x)\), where \(x\) is a vector of inputs, \(H(x)\) is a vector with same dimension, \(H(x)-x\) is the residual function, the part of the origianl mapping \(H(x)\) cutting input \(x\) out.

For example, residual function of \(y = 5x\) is \(y = 4x = 5x - x\).

1. mechnism

essentially, add an identical mapping shortcut every few layers in a deep network, to learn the same mapping.

If the best way is identity mapping, you get it from the shortcut. If it is not, just ignore it.

Added information for fitting.

2. why it is important

2.1. phenomenon : deep network works sometimes worse than shallower network

2.2. theorem : this should not be

we can have a deep network work(say 100 layers) identical as a shallower network(say 10 layers), by stacking 90 layers of identity mapping after the shallower network.

The 10+90 network should work exactly the same as the 10 network.

2.3. hypothese : maybe, the non-linear operations/mappings made identical mapping hard

it is fair, as at least relu drop parts of input, making identity mapping infeasible.

2.4. solved problem: avoided non-linear mapping - identity mapping difficulty

3. notes

  • what is relu? it is used in F(x) + x, so I assume some kind of sum function. I think it’s of crucial importance to the problem.a
  • what does non-linear layer mean? I think it probably refer to way of provide value to activation function other than a linear combinition of wx. like convolution, where a kernel filters stuff, which in my opinion may be still kind of linear, but is definitely have information loss.

Bibliography

[1]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” arXiv, Dec. 2015. Accessed: Nov. 30, 2023. [Online]. Available: http://arxiv.org/abs/1512.03385

Backlinks

(example)

in He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian :: Deep Residual Learning for Image Recognition, the presence and usage of identical shortcut is the tension of the whole discussion. how deep learning, relu, non-linear transformation works are foundation knowledge.

Author: Linfeng He

Created: 2024-04-03 Wed 20:20