I am listing out the basic loss function commonly encountered in computer vision research. Most of the description is quoted from the
pytorch implementation documentation.
-
- Binary Cross-Entropy (BCE):
- Binary Cross Entropy with Logit:
- Regression:
- Mean Squared Error (MSE):
- Average MSE:
- y^{gt} vector comprised of binary 0/1 of ground truth. y^{pred} vector comprised of binary 0/1 of prediction.
- This normalized MSE will
give compute the overall difference in content between the ground truth and prediction
-
- Instance MSE:
- y^{gt} vector comprised of binary 0/1 of ground truth. y^{pred} vector comprised of binary 0/1 of prediction. This normalized MSE will give emphasis on the foreground prediction ie, 1s in the ground truth vector:
-
- joint loss (Average MSE + Instance MSE) is better to compute
- L1 Loss:
- Smooth L1 Loss:
- Cosine Embedding Loss:
- Contrastive Loss:
- Triplet Loss:
-
- Generalized, adaptive robust loss
- 3D Reconstruction Losses:
- Voxel-based representation.
- Eg, discretize the space into 32x32x32 voxels 3D grid
- The output layer predicts the sigmoid function for each voxel as a classification task.
- Binary cross-entropy for each voxel with reconstructed voxel's label vs voxels ground truth label
- Point-cloud based representation.
- Fixed number of 3D points as a regression task. Eg, 1024x3 fixed 3D point regression task.
- Loss is defined to be the Chamfer distance to measure point to point similarity between two PCL.
- Reference: Fan et al. [Point set generation for 3D reconstruction from a single image, CVPR'17]
- Focal Loss:
- check how to use for semantic segmentation
No comments:
Post a Comment