HB_ONNXRuntime Basic Usage

D-Robotics · July 28, 2023, 6:05am

1 Usage scenario

HB_ONNXRuntime is Horizon based on the public ONNXRuntime package of a set of x86 ONNX model inference library, In addition to supporting the original ONNX model directly derived by Pytorch, TensorFlow, PaddlePaddle and other training frameworks [Note 1], It also supports the ONNX model [Note 2] (the.bin model is not supported) for each stage produced during the Horizon PTQ toolchain conversion process. Examples of each are given below.

2 Instructions for use

2.1 Original ONNX model inference

Evaluating the accuracy of the original ONNX model derived from each training framework can help us first confirm the ONNX model itself:

Whether the reasoning is normal, otherwise errors will also be reported when the PTQ toolchain is fed for subsequent conversion
Whether the accuracy is correct, that is, whether the reasoning results are consistent with the original training framework model or have little difference, so as to exclude problems in the derivation of the ONNX model Compared with the public version of ONNXRuntime (refer to the code in the appendix), we recommend directly using the Horizon encapsulated ‘HB_ONNXRuntime’ for reasoning, its value and behavior are completely consistent with the public version, and the code can also be reused in the subsequent testing of PTQ models at various stages. The specific reference code is as follows:

from horizon_tc_ui import HB_ONNXRuntime

def preprocess(input_name):
return data

def postprocess(model_output):
pass

def main():
sess = HB_ONNXRuntime(MODEL_PATH)
input_names = [input.name for input in sess.get_inputs()]
output_names = [output.name for output in sess.get_outputs()]
feed_dict = dict()
for input_name in input_names:
feed_dict[input_name] = preprocess(input_name)
outputs = sess.run_feature(output_names, feed_dict, input_offset=0)
postprocess(outputs)

if name == ‘main’:
main()

differents:

init

ONNX

import onnxruntime as rt

HB_ONNXRuntime

from horizon_tc_ui import HB_ONNXRuntime

load mdoel files

ONNX

sess = rt.InferenceSession(MODEL_PATH)

HB_ONNXRuntime

sess = HB_ONNXRuntime(MODEL_PATH)

infer modes

ONNX

outputs = sess.run(output_names, feed_dict)

HB_ONNXRuntime

outputs = sess.run_feature(output_names, feed_dict, input_offset=0)

2.2 PTQ model inference at each stage

First, we need to understand the inference accuracy and testing significance of PTQ models at each stage:

***_original_float_model.onnx

Floating-point model, with preprocessing nodes inserted only before the original ONNX model

***_optimized_float_model.onnx

Floating-point model, graph optimization has been completed, the inference result is consistent with ***_original_float_model.onnx, and no test is required normally

***_calibrated_model.onnx

It is mainly used for precision debugging tool [Note 3]. Normally, there is no need to pay attention to it

***_quantized_model.onnx

The inference results of the fixed-point model are consistent with the ***.bin model used in board deployment

The difference in the inference code of the above stages of the model is actually only in the ** data pre-processing ** and ** inference interface **. The basic operation of data preprocessing is the same as the original model, but the data type and layout after processing are slightly different, which is related to some configuration items in the yaml file. The Horizon PTQ toolchain supports the insertion of a BPU accelerated pre-processing node at the front end of the model, in turn:

Conversion of ‘input_type_rt’ to ‘input_type_train’ data type;
data normalization of (data - ‘mean_value’) * ‘scale_value’. Supplementary explanations for these four parameters are as follows:

input_type_rt

The inference data type that the

model gets when deployed on the board is usually featuremap, nv12, or gray: - featuremap : float32, which is not an image input - nv12 : uint8, XJ3/J5 video channel output is nv12 - gray : uint8, grayscale, corresponding to the single y component of nv12

input_type_train

The data type obtained during model training is usually RGB/BGR, gray or featuremap

mean_value & scale_value

The normalization operation in the data pre-processing process can be integrated into the model using BPU acceleration, and care should be taken to avoid repeated operations in the inference code when reasoning the model of various stages of PTQ output.

It should be noted that the.bin model used in on-board deployment has a pre-processing node that can achieve a complete conversion of data types, while the ONNX model lacks the coordination of some on-board execution hardware. So only the ‘input_type_train’ type is processed from an intermediate type corresponding to ‘input_type_rt’. The intermediate types corresponding to each ‘input_type_rt’ are shown in the following table, where ‘_128’ represents uint8 data converted to int8 after subtracting 128.

nv12

yuv444

rgb

bgr

gray

featuremap

yuv444_128

RGB_128

BGR_128

GRAY_128

featuremap

The layout (NCHW or NHWC) after data preprocessing will also have some differences, as follows. ** To avoid errors, we recommend using the open source tool Netron[Note 4] for direct visualization of ONNX models requiring inference ** :

For the XJ3 toolchain, the ‘***_quantized_model.onnx’ model of the image input model is forced to change to NHWC, and the other models remain consistent with the original model (usually NCHW).
For the model entered by featuremap, both the XJ3 and J5 toolchains remain consistent with the original model
- - - [4] ONNX visualization open source tools: [Netron] (https://github.com/lutzroeder/netron)

To sum up, the differences of ONNX models at each stage are summarized as follows, and the inference code in chapter 2.1 of this paper can be modified when used specifically.

Image input:

***_original_float_***

***_optimized_float_***

***_quantized_***

Data type

input_type_rt 对应的中间类型，但不减 128，通过推理接口实现

data layout

similar to otigin model

XJ3

NHWC

J5 input type: nv12

NHWC

J5 not nv12

similar to otigin model

Data normalization

请配置在 yaml 中，推理代码中删除

infer api

sess.run(output_names, feed_dict, input_offset=128)

2）Featuremap input：

***_original_float_***

***_optimized_float_***

***_quantized_***

data type

featuremap

data layout

同原始模型

data normalizaton

Please do this directly in the pre-processing code, and can only be configured via yaml if channel=3

infer api

sess.run_feature(output_names, feed_dict, input_offset=0)

3）Image + Featuremap blends multiple inputs：

***_original_float_***

***_optimized_float_***

***_quantized_***

data type

input_type_rt Specifies the corresponding intermediate type. Operations involving -128 in image input must be performed during data preprocessing

data layout

similar to origin model

data normalization

The image should be configured in yaml and deleted in inference code; featuremap is done directly in the preprocessing code. It can be configured by yaml only when channel=3

infer api

sess.hb_session.run(output_names, feed_dict)

2.3 Supplementary remarks

When verifying the accuracy of the model, we should also understand that nv12 has a smaller data space than RGB/BGR/YUV444, so the conversion of data from RGB/BGR/YUV444 to nv12 will introduce unavoidable small and irreversible errors. In general, this error has little effect on the accuracy of the model, but there are exceptions. Taking the model reasoning of ‘input_type_rt: nv12’ as an example, the JPEG image read based on opencv/skimage is BGR/RGB, and in order to maintain consistency with the board behavior, it is first processed into nv12, and then converted to the middle type yuv444. At this time, if the robustness of the model itself is poor, then the error introduced by ‘BGR/RGB - >nv12’ may have a greater impact on the inference result. Therefore, the robustness of the model should be improved as much as possible during model training, so as to reduce the perturbation effect of such small errors.

3 Appendix

The public ONNXRuntime inference reference code for the original ONNX model is as follows:

import onnxruntime as rt

def preprocess(input_name):
    return data  

def postprocess(model_output):
    pass

def main():
    sess = rt.InferenceSession(MODEL_PATH)
    input_names = [input.name for input in sess.get_inputs()]
    output_names = [output.name for output in sess.get_outputs()]
    feed_dict = dict()
    for input_name in input_names:
        feed_dict[input_name] = preprocess(input_name)     
    outputs = sess.run(output_names, feed_dict)
    postprocess(outputs)  

if __name__ == '__main__':     
    main()