QAT - Introduction to Heterogeneous and Non-Heterogeneous Solutions

D-Robotics · December 28, 2022, 7:54am

1 Introduction

Before reading how plugins are used, let’s define the concept of heterogeneous and non-heterogeneous to avoid ambiguity: a heterogeneous model is a model that runs partly on the BPU and partly on the CPU when deployed. Non-heterogeneous model deployments run entirely on the BPU.

Horizon’s horizon_plugin_pytorch quantization training tool based on PyTorch (which will be released to XJ3 users with the OE development kit in early 2023) supports both Eager and fx modes. Among them, fx mode has been supported since the plugin-1.0.0 version, compared to the Eager solution, they have the following differences:

The use of the eager mode is recommended in the user manual -4.2 Quantitative Perception Training (4.2.2. There are complete examples of getting started quickly in the Quick user hand. Please refer to 4.2.3. User Guide). The following description of the use of heterogeneous and non-heterogeneous solutions is based on the fx mode. For the API description of the fx mode, please refer to the user manual -4.2.3.4.2. Description of major interface parameters.

2 Use of heterogeneous and non-heterogeneous solutions

The advantages and disadvantages of heterogeneous and non-heterogeneous solutions are shown in the following figure:

In general, we only use heterogeneous scenarios in the following two situations:

The model contains operators that are not supported by BPU.

The error of model quantization accuracy is too large, and some operators need to be put on CPU for high-precision calculation.

2.1 Non-Heterogeneous Scheme

Since the performance of BPU operators is much higher than that of CPU operators, from the perspective of performance optimization, it is recommended that you use pure BPU operators to build models as much as possible.

The steps of using non-heterogeneous solutions are roughly as follows:

1. Floating-point model preparation: Complete operator replacement (reference operator support list), insert quantization and antiquantization nodes

2. Set the hardware architecture

3. Calibration (optional)

4. Model quantification

a. Set qconfig(it is recommended to set the global qconfig to get_default_qat_qconfig() first, and then modify it according to the requirements. Generally speaking, only need to set qconfig for int16 and high precision output op separately)

b. Transfer to qat model

5. Quantitative training & accuracy verification

6. Turning point model & accuracy verification

7. Model compilation

Reference code:

import torch 
from horizon_plugin_pytorch.march import March, set_march 
from horizon_plugin_pytorch.quantization import ( 
    get_default_calib_qconfig, 
    get_default_qat_qconfig, 
    get_default_qat_out_qconfig, 
    prepare_qat_fx, 
    convert_fx, 
    check_model, 
    compile_model, 
) 
from torch import nn 
from torch.quantization import DeQuantStub, QuantStub 
from horizon_plugin_pytorch.utils.onnx_helper import export_to_onnx 
 
class ConvBNReLU(nn.Sequential): 
    def __init__(self, channels=3): 
        super(ConvBNReLU, self).__init__( 
            nn.Conv2d(channels, channels, 1), 
            nn.BatchNorm2d(channels), 
            nn.ReLU(), 
        ) 
 
class ConvBNAddReLU(nn.Sequential): 
    def __init__(self, channels=3): 
        super(ConvBNAddReLU, self).__init__() 
        self.conv = nn.Conv2d(channels, channels, 1) 
        self.bn = nn.BatchNorm2d(channels) 
        self.relu = nn.ReLU() 
        self.add = torch.nn.quantized.FloatFunctional() 
    def forward(self, x): 
        out = self.conv(x) 
        out = self.bn(out) 
        return self.relu(self.add.add(x,out)) 
 
class ConvAddReLU(nn.Sequential): 
    def __init__(self, channels=3): 
        super(ConvAddReLU, self).__init__() 
        self.conv = nn.Conv2d(channels, channels, 1) 
        self.relu = nn.ReLU() 
        self.add = torch.nn.quantized.FloatFunctional() 
    def forward(self, x): 
        out = self.conv(x) 
        return self.relu(self.add.add(x,out)) 
 
class NonHybridModel(nn.Module): 
    def __init__(self, channels=3): 
        super().__init__() 
        self.quant = QuantStub() 
        self.layer0 = ConvBNReLU() 
        self.layer1 = ConvAddReLU() 
        self.layer2 = ConvBNAddReLU() 
        self.conv1 = nn.Conv2d(channels, channels, 1) 
        self.dequant = DeQuantStub() 
 
    def forward(self, input): 
        x = self.quant(input) 
        x = self.layer0(x) 
        x = self.layer1(x) 
        x = self.layer2(x) 
        x = self.conv1(x) 
        return self.dequant(x) 
 
data_shape = [1, 3, 224, 224] 
data = torch.rand(size=data_shape) 
model = NonHybridModel() 
float_res = model(data) 

set_march(March.BAYES) 

calibration_model = prepare_qat_fx( 
    model, 
    { 
        "": get_default_calib_qconfig(), 
    }, 
) 
calibration_model.eval() 
for i in range(5): 
    calibration_model(torch.rand(size=data_shape)) 
 
qat_model = prepare_qat_fx( 
    calibration_model, 
    { 
        "": get_default_qat_qconfig(),       
        "module_name": [("conv1", get_default_qat_out_qconfig())] 
    }, 
) 
qat_res = qat_model(data) 
 
quantize_model = convert_fx(qat_model) 
quantize_res = quantize_model(data) 
 
export_to_onnx(qat_model,data,"qat.onnx",enable_onnx_checker=True, operator_export_type=None) 
traced_model = torch.jit.trace(quantize_model, data) 
check_model(quantize_model, data, advice=1) 
compile_model(traced_model, [data], opt=3, hbm="./model_output/model.hbm")

2.2 Heterogeneous Solutions

The steps for using heterogeneous solutions are as follows:

1. Floating-point model preparation:

a. Complete operator replacement (refer to ptq scheme operator support list)

① For non-module operations, if you need to set qconfig separately or specify that the operations run on the CPU, encapsulate the operations as modules. For details, see _SeluModule in the following example

b. Insert quantization and antiquantization nodes

① If the first op is the cpu op, you do not need to insert the QuantStub

If the last op is the cpu op, you do not need to insert the DeQuantStub

2. Set the hardware architecture

3. Calibration (optional)

4. Model quantification

a. Set qconfig: It is recommended to set the global qconfig to get_default_qat_qconfig() first, and then modify it based on requirements. Generally, you only need to set qconfig for int16 and op with high precision output

b. Go to the qat model: set hybrid=True, and specify the node you want to run on the cpu by hybrid_dict

5. Quantitative training & accuracy verification

6. Export onnx

7. Measurement of fixed point accuracy

8. Use hb_mapper tool to complete fixed point conversion & model compilation

Reference code:

import numpy as np 
import torch 
from horizon_plugin_pytorch.march import March, set_march 
from horizon_plugin_pytorch.nn import qat 
from horizon_plugin_pytorch.quantization import ( 
    get_default_calib_qconfig, 
    get_default_qat_qconfig, 
    get_default_qat_out_qconfig, 
    prepare_qat_fx, 
    convert_fx, 
) 
from torch import nn 
from torch.quantization import DeQuantStub, QuantStub 
from horizon_plugin_pytorch.utils.onnx_helper import export_to_onnx 

class ConvReLU(nn.Sequential): 
    def __init__(self, channels=3): 
        super().__init__() 
        self.conv = nn.Conv2d(channels, channels, 1) 
        self.relu = torch.nn.ReLU() 
 
    def forward(self, x): 
        x = self.conv(x) 
        x = self.relu(x) 
        return x 
 
class _SeluModule(nn.Module):     
    def forward(self, input): 
        return torch.nn.functional.selu(input) 
 
class HybridModel(nn.Module): 
    def __init__(self, channels=3): 
        super().__init__() 
        #self.quant = QuantStub() 
        self.layer0 = ConvBNReLU() 
        self.layer1 = ConvBNReLU() 
        self.layer2 = ConvBNReLU() 
        self.selu = _SeluModule() 
        self.conv0 = nn.Conv2d(channels, channels, 1) 
        self.conv1 = nn.Conv2d(channels, channels, 1) 
        self.dequant = DeQuantStub() 
 
    def forward(self, input): 
        x = self.selu(x) 
        x = self.layer0(x) 
        x = self.selu(x) 
        x = self.layer1(x) 
        x = self.layer2(x) 
        x = self.conv0(x) 
        x = self.conv1(x) 
        x = self.selu(x) 
        return self.dequant(x) 
 
data_shape = [1, 3, 224, 224] 
data = torch.rand(size=data_shape) 
model = HybridModel() 
float_res = model(data) 
 
set_march(March.BAYES) 
 
calibration_model = prepare_qat_fx( 
    model, 
    { 
        "": get_default_calib_qconfig(), 
    }, 
    hybrid=True, 
    hybrid_dict={         
    "module_name": ["layer1.conv", "conv0"],         
    "module_type": [_SeluModule],     
    }, 
) 
calibration_model.eval() 
for i in range(5): 
    calibration_model(torch.rand(size=data_shape)) 
 
qat_model = prepare_qat_fx( 
    calibration_model, 
    { 
        "": get_default_qat_qconfig(),      
        "module_name": [("conv1", get_default_qat_out_qconfig())] 
    }, 
    hybrid=True, 
    hybrid_dict={         
    "module_name": ["layer1.conv", "conv0"],         
    "module_type": [_SeluModule],     
    }, 
) 
qat_res = qat_model(data) 

export_to_onnx(qat_model,data,"qat.onnx",enable_onnx_checker=True, operator_export_type=None) 

quantize_model = convert_fx(qat_model) 
quantize_res = quantize_model(data)

8. Use hb_mapper tool to complete fixed point conversion & model compilation

Simple config.yaml configuration example:

model_parameters: 
  onnx_model: "qat.onnx" 
  march: "bayes" 
  output_model_file_prefix: 'hybrid' 
 
input_parameters: 
  input_type_rt: 'featuremap' 
  input_type_train: 'featuremap' 
  input_layout_train: 'NCHW' 
  input_layout_rt: 'NCHW'  
  norm_type: 'no_preprocess' 
 
calibration_parameters: 
  calibration_type: 'load' 
  run_on_cpu: "Conv_32"  
 
compiler_parameters: 
  compile_mode: 'latency' 
  debug: false 
  optimize_level: 'O3'

mdoel convert

hb_mapper makertbin -c config.yaml --model-type onnx

8. 使用hb_mapper工具完成定点转换和模型编译

简单的配置。Yaml配置示例:

Using the hb_perf tool to generate the model structure diagram, it can be observed that no high-precision output is set for the conv at the end of the first and second bpu segments, and only high-precision output is set for the tail node of the third bpu segment:

3 Other Common problems

Print qat_model found some extra generated_add nodes. Why is this?

Answer: This is because the “+” operator is used directly in the model, which is automatically replaced by the newly registered generated_add by the tool. It is recommended that you do the operator replacement yourself during the floating-point model preparation phase, because the tool will automatically convert the name of the add according to the order of execution, if you modify the model later, there may be a problem that the original ckpt cannot be loaded.