Unit Conversion Instructions (For Optimizing Model Performance)

D-Robotics · August 22, 2022, 5:00am

Application scenarios and principles

Whether an op can run on a BPU depends on two conditions:

1. Check whether the op is supported by the BPU

2. Can you find the quantization threshold of the op

For some non-computationally intensive op, the quantization threshold depends on the featuremap Tensor of the upstream and downstream op. Therefore, if you use the non-computationally intensive op (concat, reshape, etc.) at the beginning and end of the model, and want the op to run on the bpu for maximum performance, By inserting a unitconv in front/back of the operator and introducing new quantization threshold statistics through the featuremap Tensor of the unitconv, it can ensure that the upstream and downstream op of the unticonv can find the quantization threshold and then quantize it on the BPU.

Due to hardware characteristics, Horizon toolchain supports int32 high-precision output for the conv calculation at the tail of the model. For other operators (concat, reshape, etc.), int8 will only be output, and the conv in front of it will fail to be output with high-precision. Therefore, the use of unit_conv to quantify non-computationally intensive op may affect the accuracy of the model. If it is confirmed that the accuracy will be reduced, it is recommended that you remove operators such as concat from the model and integrate them into the pre - and post-processing.

Mode of use

To insert unit_conv into the model, refer to the following code:

class unit_conv(nn.Module): def __init__(self): super(unit_conv,self).__init__() ··· ··· self.cat = torch.cat self.unitconv = torch.nn.Conv2d(8,8,1,1,groups=8,bias=False) torch.nn.init.dirac_(self.unitconv.weight.data,groups=8) def forward(self, x): ··· ··· out = self.cat((a,b),axis=1) out = self.unitconv(out) return out

unit_conv：