Application scenarios and principles
Whether an op can run on a BPU depends on two conditions:
1. Check whether the op is supported by the BPU
2. Can you find the quantization threshold of the op
For some non-computationally intensive op, the quantization threshold depends on the featuremap Tensor of the upstream and downstream op. Therefore, if you use the non-computationally intensive op (concat, reshape, etc.) at the beginning and end of the model, and want the op to run on the bpu for maximum performance, By inserting a unitconv in front/back of the operator and introducing new quantization threshold statistics through the featuremap Tensor of the unitconv, it can ensure that the upstream and downstream op of the unticonv can find the quantization threshold and then quantize it on the BPU.
Due to hardware characteristics, Horizon toolchain supports int32 high-precision output for the conv calculation at the tail of the model. For other operators (concat, reshape, etc.), int8 will only be output, and the conv in front of it will fail to be output with high-precision. Therefore, the use of unit_conv to quantify non-computationally intensive op may affect the accuracy of the model. If it is confirmed that the accuracy will be reduced, it is recommended that you remove operators such as concat from the model and integrate them into the pre - and post-processing.
Mode of use
To insert unit_conv into the model, refer to the following code:
class unit_conv(nn.Module): def __init__(self): super(unit_conv,self).__init__() ··· ··· self.cat = torch.cat self.unitconv = torch.nn.Conv2d(8,8,1,1,groups=8,bias=False) torch.nn.init.dirac_(self.unitconv.weight.data,groups=8) def forward(self, x): ··· ··· out = self.cat((a,b),axis=1) out = self.unitconv(out) return out
unit_conv: