PTQ Accuracy Tuning Method - Set bias_correction

Foreword:

Due to the influence of model structure and parameters, such as cross-layer connection of different value distributions, attention mechanism and maximum/small weight parameters, the quantization accuracy may not meet the expectations in the post-quantization process. Bias Correction is provided as a quantization trick to reduce quantization error losses in some scenarios. Specifically, the quantization of activation and weight will introduce noise to the model output. Unlike white Gaussian noise, the quantization noise in the actual scene often does not have the mean value of 0, resulting in deviation from the mean value of the quantization model output compared with the floating point model, which affects the quantization accuracy. bias correction Fine-tuning the bias item of the Conv/Gemm node in the model by calculating the statistical deviation of the quantized model on the calibration data, thereby reducing the mean deviation of the quantized model output and the floating-point model output.

Note that since deviation correction is a fine tuning of the model parameters, it does not result in a change in model performance (such as Latency or FPS).

1. Enable the deviation correction function

This article explains how to enable deviation correction for OpenExplorer V1.1.62 and later Toolchain development kits.

Support parameter optimization to configure the calibration parameter group in YAML file, turn on the deviation correction function, as shown below:

calibration_parameters:
    optimization: bias_correction

After the optimization parameter is configured, the deviation correction function will be performed after the model calibration is completed, and the following log file will be printed during the execution process. And a sample is selected to analyze the effectiveness of bias correction based on this sample. Firstly, the model output similarity before and after correction will be calculated respectively, and then two additional pictures will be saved to the bias_correction folder, respectively node_accumulate_err_of_qmodel_cosine-similarity.png和conv_accumulate_err_of_qmodel_mean-diff-error.png。

  • Description on the left: The two curves represent the cumulative error curve of the quantization model before and after deviation correction. Where the horizontal coordinate represents the index of nodes in the model, arranged in order from input to output; The ordinate is the cosine similarity, that is, the cosine similarity between the output of each node in the quantization model and the output of the corresponding node in the floating point model is calculated. When the corrected curve is above the pre-corrected curve, it indicates that the error accumulation of the model can be improved by deviation correction.
  • Description on the right: The two curves represent the Conv output error curves before and after deviation correction. Where the horizontal coordinate represents the index of Conv nodes in the model, arranged in the order from input to output; The ordinate represents the average error, which is the average of the difference between the output of each Conv node in the quantized model and the output of the corresponding node in the floating point model. When the curve after correction is closer to 0 than before correction, it indicates that the deviation correction can improve the Conv output error.

2. Accuracy improvement verification

Taking mnasnet_1.0_96 in bayes architecture using horizon-nn==0.20.1 as an example, the default calibration mode is adopted. The accuracy comparison before and after the deviation correction is enabled is as follows:

quantization method

quantization precision loss

Deviation correction not turned on

1.25%

Turn on deviation correction

0.24%

However, in the actual development, the uncertainty of the accuracy improvement of the deviation correction method is slightly larger, and even causes some negative effects, so this method is only used as an optional tuning method to optimize the accuracy of the dropped point model.

3. Summary

Through the explanation of this paper, I believe that readers have a clearer understanding of how to turn on the deviation correction function and the deviation correction output information in the post-quantization stage. In practice, the output information of deviation correction can be used as a reference to verify whether the deviation correction is effective, but the final effect needs to be based on the accuracy evaluation results of the model.