Apr 17, 2017

Deep Residual Network (ResNet)

Main idea:
The central idea of the paper itself is simple and elegant. They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to a residual block in which the convolution layers predict a residual that is added to the block's input tensor.

Reference 



 Although, Deep feed-forward conv nets tend to suffer from optimization difficulty (high training and high validation error). The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers.

Observations:
  • add the previous conv output 'x' (as residual) to the next output 
    • H(x) = F(x)+x
    •         = F(x)+Ix // multiplication with identity I-called identity mapping
  • If x is sufficient then F(.) will learn to weight the filters to zero. Otherwise learn to adjust weights to get optimal value.
  • Simply adding series of conv layers has large training error. 56-layer net has higher training error and test error than 20-layer net "Overly deep" plain nets have higher training error
  • Very simple design (series of fixed 3x3 conv layers)
  • Shortcut mapping is identity then forward pass additively propagates and Loss additively passes back as gradient (as opposed to multiplicative gradient propagation in other case)
  • what if shortcut mapping ℎ ≠ identity?
    • eg, conv(), xor, multiply with 0.5 etc increases the error
  • Keep the shortest path as smooth as possible by 
    • using identity
    • forward/backward signals directly flow through this path 
    •  
  • More on ResNet details
  • Update (09/25/21): another good ResNet tutorial 

No comments:

Carlo Cipolla's Laws of Stupidity

    "By creating a graph of Cipolla's two factors, we obtain four groups of people. Helpless people contribute to society but are...