Quick Start with QAT (fx mode)

D-Robotics · August 24, 2023, 9:27am

1. Quick overview of necessary steps

The whole QAT solution consists of five steps from floating point to deployment model: floating point model preparation, data calibration, quantization training (optional), fixed point conversion, and model compilation. The necessary steps and sample code are shown below. For detailed instructions and precautions for each step, refer to the following section. For a complete example, refer to the fx_mode.py script in the /ddk/samples/ai_toolchain/horizon_model_train_sample/plugin_basic directory in the OE development package. ** It is highly recommended to skip the training process and complete the prepare->convert->check step before quantization training (even during the floating-point model design phase) to ensure that the model is hardware supported. **

from horizon_plugin_pytorch.quantization import (
    convert_fx, 
    prepare_qat_fx,
    set_fake_quantize,
    FakeQuantState,
    check_model,
    compile_model,
)
from horizon_plugin_pytorch.quantization.qconfig import (
    default_calib_8bit_fake_quant_qconfig,
    default_qat_8bit_fake_quant_qconfig,
    default_calib_8bit_weight_32bit_out_fake_quant_qconfig,
    default_qat_8bit_weight_32bit_out_fake_quant_qconfig
)
from horizon_plugin_pytorch.march import March, set_march
set_march(March.BAYES)
float_model = load_float_model(pretrain=True) 
calib_model = prepare_qat_fx(
    copy.deepcopy(float_model),
    {
        "": default_calib_8bit_fake_quant_qconfig,
        "module_name": {
            "classifier": default_calib_8bit_weight_32bit_out_fake_quant_qconfig,
         },
     },
)
calib_model.eval()
set_fake_quantize(calib_model, FakeQuantState.CALIBRATION)
calibrate(calib_model)
calib_model.eval()
set_fake_quantize(calib_model, FakeQuantState.VALIDATION)
evaluate(calib_model)
torch.save(calib_model.state_dict(), "calib-checkpoint.ckpt")
qat_model = prepare_qat_fx(
    copy.deepcopy(float_model),
    {
        "": default_qat_8bit_fake_quant_qconfig,
        "module_name": {
            "classifier": default_qat_8bit_weight_32bit_out_fake_quant_qconfig,
         },
     },
) 
qat_model.load_state_dict(calib_model.state_dict())
qat_model.train()
set_fake_quantize(qat_model, FakeQuantState.QAT)
train(qat_model)
qat_model.eval()
set_fake_quantize(qat_model, FakeQuantState.VALIDATION)
evaluate(qat_model)
base_model = qat_model # 校准精度满足预期时使用 calib_model
quantized_model = convert_fx(base_model)
evaluate(quantized_model)
script_model = torch.jit.trace(quantized_model.cpu(), example_input)
check_model(script_model, [example_input])
compile_model(script_model,[example_input],hbm="model.hbm",input_source="pyramid",opt=3)

2. Explain each step in detail

2.1 Floating-point model preparation

a. Please use enough data to train the floating point model normally until it converges before quantizing training. b. It is strongly recommended to normalize the input data, which is conducive to floating point convergence and makes the model more quantization-friendly. c. You are advised to check the supported operators list during the design phase of the floating point model to avoid prepare qat or compilation errors caused by using unsupported operators. d. If cpu operators are used in the model and you want to compile them into the model, refer to the user manual 4.2.4.4. Heterogeneous model guide for transformation compilation. e. For more instructions on how to build a quantitative friendly model, refer to the user manual 4.2.4.1 Requirements for Floating point Models

Although fx mode is less invasive to the original floating point model code than eager mode, it still needs to make some necessary modifications to the floating point model to support subsequent quantization operations.

Insert the QuantStub node before model input and the DequantStub node after model output. There are the following precautions:
Multiple inputs can share a QuantStub only if they are of the same scale, otherwise define a separate QuantStub for each input
it is recommended to use horizon_plugin_pytorch. Quantization. QuantStub default input scale dynamic statistics, if can calculate the scale of the scene ahead suggest manually scale (for example, bev homo matrix) of the model, Version of the corresponding interface torch. The quantization. QuantStub does not support manual Settings.
It is suggested that the parts of the model that do not need to be quantized, such as before and after processing and loss, should not be written in the forward function of the model, so as to avoid being mistakenly inserted into the pseudo-quantization node, which will affect the accuracy of the model.
for dynamic control flow as well as some python built-in function – such as trace an unsupported operation (see [official description * * * *] (https://pytorch.org/docs/2.0/fx.html#non-torch-functions)), It needs to be defined separately and modified with wrap, recommended as follows:

from horizon_plugin_pytorch.utils.fx_helper import wrap as fx_wrap

@fx_wrap()
def test(self, x):
if self.training:
pass

def forward(self, x):
···
x = self.test(x)
return x

2.2 Data calibration

For some models, the accuracy can be achieved by Calibration alone, without time-consuming quantitative perception training. Even if the model cannot meet the accuracy requirements after quantitative calibration, this process can reduce the difficulty of subsequent quantitative perception training, shorten the training time, and improve the final training accuracy. Data Calibration way of specific configuration and tuning and Suggestions reference [QAT solution Calibration instructions * * * *] (https://developer.horizon.cc/forumDetail/177840589839214596).

2.3 Quantitative training

Some recommended hyperparameter configurations for quantitative training are shown in the following table:

Hyperparameter

Recommended configuration

Advanced configuration (try if the recommended configuration is invalid)

LR

Do scale=0.1 lr decay 2 times with StepLR starting from 0.001

1. Adjust lr between 0.0001 and >0.001 to match 1-2 lr decay. 2. The LR update policy can also try replacing StepLR with CosLR. 3. QAT uses AMP, appropriately lower lr, and too large results in nan.

Epoch

10% of the floating epoch

1. Based on the convergence of loss and metric, consider whether the epoch needs to be extended appropriately.

Weight decay

Consistent with floating point

1. It is recommended to make appropriate adjustments near 4e-5. Too small weight decay results in too large weight variance, which results in too large weight variance at the output layer of tasks with large output.

optimizer

Consistent with floating point

1. If floating-point training uses an optimizer such as OneCycle that affects LR Settings, it is recommended not to be consistent with floating-point and to use SGD instead.

transforms (data enhancement)

Transforms

Consistent with floating point

1. QAT stage can be appropriately weakened, such as the color conversion of classification can be removed, and the proportion range of RandomResizeCrop can be appropriately reduced

averaging_constant(qconfig_params)

1. Use calibration to recommend de-activation update: weight averaging_constant=1.0 activation averaging_constant=0.0

1. If there is a large calibration difference between activation averaging_constant and floating point, do not set it to 0.0 2. weight averaging_constant generally does not need to be set to 0.0, but can be adjusted between (0,1.0)

It is highly recommended that you try data calibration first, and then quantize training if the accuracy does not meet expectations (be careful to load the weight parameters after data calibration). Suggestions for tuning the quantitative training phase can be found in the [** User Manual] Quantitative training accuracy tuning recommendation * *] (HTTP: / / https://developer.horizon.cc/api/v1/fileData/horizon_j5_open_explorer_cn_doc/plugin/source/user_guide/debu G_precision.html #a-name-recommended-configuration-a).

2.4 Fixed point conversion

** Please note that there is no exact numerical agreement between the fixed-point model and the pseudo-quantized model, so please use the accuracy of the fixed-point model. If the fixed point accuracy is not up to the standard, it is still necessary to continue the quantization training, and it is recommended to retain several epochs of the qat model weights to facilitate the search for the optimal fixed point accuracy. (High accuracy of qat or calibrate does not necessarily mean high accuracy of fixed point, you can consider some backtracking to balance the final accuracy of fixed point) **

Under normal circumstances, the accuracy of the fixed-point model is exactly the same as that of the board deployment, so the model can be used to evaluate the final deployment accuracy.

2.5 Model compilation

The model compilation phase consists of the following three steps:

script_model = torch.jit.trace(quantized_model, example_input)
check_model(script_model.cpu(), [example_input])
compile_model(script_model,[example_input],hbm="model.hbm",input_source="pyramid",opt=O3)

compile_model()

3. Common Problems

**1. Why use deepcopy before preparing? ** Answer: The prepare_qat_fx and convert_fx interfaces do not support inplace parameters, so the input and output models of the two interfaces share almost all properties, so it is recommended to use deepcopy to make sure that the original input model is not changed. If you do not need to keep the input model and you are not using deepcopy, do not make any changes to the input model. **2. Why set high precision output? ** A: According to the introduction in the background of neural network quantization, the activation value calculated by the multiplication accumulator is int32. In order to continue the calculation of the next layer op, it will be changed to int8/int16 through requantization. Therefore, if the last layer is a conv/linear node, it is recommended to set a high-precision output. The model can be output directly in int32 format, which is of great benefit for accuracy preservation.

In addition, it is also possible to configure high-precision output by ‘model.classifier.qconfig = default_qat_8bit_weight_32bit_out_fake_quant_qconfig’ before preparing qat. If the priority of this mode is higher than that of prepare, pass the qconfig configuration into the dict.

plugin ≤ v1.6.2 To configure high-precision output, use default_calib_out_8bit_fake_quant_qconfig. However, this parameter will be deprecated in later versions

**3. How to understand several states of fake quantize? ** There are three states of fake quantize. set_fake_quantize should be used to set the model’s fake quantize to the corresponding state before QAT, calibration and validation respectively. In calibration state, only the statistics of input and output of each operator are observed. In the QAT state, a pseudo-quantization operation is performed in addition to the observed statistics. In the validation state, statistics are not observed and only pseudo-quantization operations are performed.

class FakeQuantState(Enum):
    QAT = "qat"
    CALIBRATION = "calibration"
    VALIDATION = "validation"

Quick Start with QAT (fx mode)

Directory

1. Quick overview of necessary steps

2. Explain each step in detail

2.1 Floating-point model preparation

2.2 Data calibration

2.3 Quantitative training

2.4 Fixed point conversion

2.5 Model compilation

3. Common Problems