1 Usage scenario
HB_ONNXRuntime is Horizon based on the public ONNXRuntime package of a set of x86 ONNX model inference library, In addition to supporting the original ONNX model directly derived by Pytorch, TensorFlow, PaddlePaddle and other training frameworks [Note 1], It also supports the ONNX model [Note 2] (the.bin model is not supported) for each stage produced during the Horizon PTQ toolchain conversion process. Examples of each are given below.
2 Instructions for use
2.1 Original ONNX model inference
Evaluating the accuracy of the original ONNX model derived from each training framework can help us first confirm the ONNX model itself:
-
Whether the reasoning is normal, otherwise errors will also be reported when the PTQ toolchain is fed for subsequent conversion
-
Whether the accuracy is correct, that is, whether the reasoning results are consistent with the original training framework model or have little difference, so as to exclude problems in the derivation of the ONNX model Compared with the public version of ONNXRuntime (refer to the code in the appendix), we recommend directly using the Horizon encapsulated ‘HB_ONNXRuntime’ for reasoning, its value and behavior are completely consistent with the public version, and the code can also be reused in the subsequent testing of PTQ models at various stages. The specific reference code is as follows:
from horizon_tc_ui import HB_ONNXRuntime
def preprocess(input_name):
return datadef postprocess(model_output):
passdef main():
sess = HB_ONNXRuntime(MODEL_PATH)
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
feed_dict = dict()
for input_name in input_names:
feed_dict[input_name] = preprocess(input_name)
outputs = sess.run_feature(output_names, feed_dict, input_offset=0)
postprocess(outputs)if name == ‘main’:
main()
differents:
init
ONNX
import onnxruntime as rt
HB_ONNXRuntime
from horizon_tc_ui import HB_ONNXRuntime
load mdoel files
ONNX
sess = rt.InferenceSession(MODEL_PATH)
HB_ONNXRuntime
sess = HB_ONNXRuntime(MODEL_PATH)
infer modes
ONNX
outputs = sess.run(output_names, feed_dict)
HB_ONNXRuntime
outputs = sess.run_feature(output_names, feed_dict, input_offset=0)
2.2 PTQ model inference at each stage
First, we need to understand the inference accuracy and testing significance of PTQ models at each stage:
***_original_float_model.onnx
Floating-point model, with preprocessing nodes inserted only before the original ONNX model
***_optimized_float_model.onnx
Floating-point model, graph optimization has been completed, the inference result is consistent with ***_original_float_model.onnx, and no test is required normally
***_calibrated_model.onnx
It is mainly used for precision debugging tool [Note 3]. Normally, there is no need to pay attention to it
***_quantized_model.onnx
The inference results of the fixed-point model are consistent with the ***.bin model used in board deployment
The difference in the inference code of the above stages of the model is actually only in the ** data pre-processing ** and ** inference interface **. The basic operation of data preprocessing is the same as the original model, but the data type and layout after processing are slightly different, which is related to some configuration items in the yaml file. The Horizon PTQ toolchain supports the insertion of a BPU accelerated pre-processing node at the front end of the model, in turn:
-
Conversion of ‘input_type_rt’ to ‘input_type_train’ data type;
-
data normalization of (data - ‘mean_value’) * ‘scale_value’. Supplementary explanations for these four parameters are as follows:
input_type_rt
The inference data type that the
model gets when deployed on the board is usually featuremap, nv12, or gray: - featuremap : float32, which is not an image input - nv12 : uint8, XJ3/J5 video channel output is nv12 - gray : uint8, grayscale, corresponding to the single y component of nv12
input_type_train
The data type obtained during model training is usually RGB/BGR, gray or featuremap
mean_value & scale_value
The normalization operation in the data pre-processing process can be integrated into the model using BPU acceleration, and care should be taken to avoid repeated operations in the inference code when reasoning the model of various stages of PTQ output.
It should be noted that the.bin model used in on-board deployment has a pre-processing node that can achieve a complete conversion of data types, while the ONNX model lacks the coordination of some on-board execution hardware. So only the ‘input_type_train’ type is processed from an intermediate type corresponding to ‘input_type_rt’. The intermediate types corresponding to each ‘input_type_rt’ are shown in the following table, where ‘_128’ represents uint8 data converted to int8 after subtracting 128.
nv12
yuv444
rgb
bgr
gray
featuremap
yuv444_128
yuv444_128
RGB_128
BGR_128
GRAY_128
featuremap
The layout (NCHW or NHWC) after data preprocessing will also have some differences, as follows. ** To avoid errors, we recommend using the open source tool Netron[Note 4] for direct visualization of ONNX models requiring inference ** :
-
For the XJ3 toolchain, the ‘***_quantized_model.onnx’ model of the image input model is forced to change to NHWC, and the other models remain consistent with the original model (usually NCHW).
-
For the model entered by featuremap, both the XJ3 and J5 toolchains remain consistent with the original model
-
-
-
- [4] ONNX visualization open source tools: [Netron] (https://github.com/lutzroeder/netron)
-
-
-
To sum up, the differences of ONNX models at each stage are summarized as follows, and the inference code in chapter 2.1 of this paper can be modified when used specifically.
- Image input:
***_original_float_***
***_optimized_float_***
***_quantized_***
Data type
input_type_rt 对应的中间类型,但不减 128,通过推理接口实现
data layout
similar to otigin model
similar to otigin model
XJ3
NHWC
J5 input type: nv12
NHWC
J5 not nv12
similar to otigin model
Data normalization
请配置在 yaml 中,推理代码中删除
infer api
sess.run(output_names, feed_dict, input_offset=128)
2)Featuremap input:
***_original_float_***
***_optimized_float_***
***_quantized_***
data type
featuremap
data layout
同原始模型
data normalizaton
Please do this directly in the pre-processing code, and can only be configured via yaml if channel=3
infer api
sess.run_feature(output_names, feed_dict, input_offset=0)
3)Image + Featuremap blends multiple inputs:
***_original_float_***
***_optimized_float_***
***_quantized_***
data type
input_type_rt Specifies the corresponding intermediate type. Operations involving -128 in image input must be performed during data preprocessing
data layout
similar to origin model
data normalization
The image should be configured in yaml and deleted in inference code; featuremap is done directly in the preprocessing code. It can be configured by yaml only when channel=3
infer api
sess.hb_session.run(output_names, feed_dict)
2.3 Supplementary remarks
When verifying the accuracy of the model, we should also understand that nv12 has a smaller data space than RGB/BGR/YUV444, so the conversion of data from RGB/BGR/YUV444 to nv12 will introduce unavoidable small and irreversible errors. In general, this error has little effect on the accuracy of the model, but there are exceptions. Taking the model reasoning of ‘input_type_rt: nv12’ as an example, the JPEG image read based on opencv/skimage is BGR/RGB, and in order to maintain consistency with the board behavior, it is first processed into nv12, and then converted to the middle type yuv444. At this time, if the robustness of the model itself is poor, then the error introduced by ‘BGR/RGB - >nv12’ may have a greater impact on the inference result. Therefore, the robustness of the model should be improved as much as possible during model training, so as to reduce the perturbation effect of such small errors.
3 Appendix
The public ONNXRuntime inference reference code for the original ONNX model is as follows:
import onnxruntime as rt
def preprocess(input_name):
return data
def postprocess(model_output):
pass
def main():
sess = rt.InferenceSession(MODEL_PATH)
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
feed_dict = dict()
for input_name in input_names:
feed_dict[input_name] = preprocess(input_name)
outputs = sess.run(output_names, feed_dict)
postprocess(outputs)
if __name__ == '__main__':
main()