Model Accuracy Verification and Tuning Suggestions

0 Precision verification recommendation process

[Recommended process for precision tuning]

process

Purpose

verification approach

Reference code/Other descriptions

1

Ensure the validity of the exported floating point onnx

the single result of testing the floating point onnx model should be completely consistent with the reasoning result after training

please refer to the following: 1.1 Verifying the correctness of the reasoning result of the onnx model

2

ensure the correctness of the yaml configuration file and the pre - and post-processing code

test the original_float.onnx model’s single result should be identical to the floating point onnx inference result (except nv12 format, where some differences may be introduced due to the lossy nv12 data itself)

please refer to below: 1.2 yaml configuration File and pre-processing code

3

ensure that no precision error is introduced into the graph optimization phase

test the single result of optimize_float.onnx model should be identical with the inference result of original_float.onnx

Inference code as above (if the results are inconsistent, please try to update the OE development environment to the latest version first, if you still have problems, please ask the Horizon developer community Toolchain section, and provide the original floating point onnx, configuration file, test data)

4

Check whether the quantization precision meets the expected requirements

test the quantized.onnx precision indicator

It is suggested that the floating point model evaluation code can be reused directly: Change the code of the model loading/inference part to load/inference the batchsize of the onnx model dataloader to correspond to the shape of onnx and modify the preprocessing code accordingly (refer to the above suggestions). If the accuracy does not meet the expectations, the precision tuning process can be directly entered (refer to the following 1.4 Suggestions for precision tuning).

5

Ensure that the model compilation process is correct and the on-board reasoning code is correct

use the hb_model_verifier tool to verify the consistency of quantized.onnx and.bin. The model output should be aligned at least 2-3 decimal points

‘hb_model_verifier’ tool currently only supports single-input model comparison. J5 from OE1.1.40 and XJ3 from OE2.5.2 provide ‘hb_verifier’ tool, which can support multi-input model consistency check. For details, please refer to Section 1.3 of the following. In addition, the tool ‘hrt_model_exec infer’ can be used at the board end to infer.bin model’s original output can be compared with quantized.onnx can also exclude the influence caused by engineering code errors. (If the consistency check between quantized.onnx and.bin model fails, please try to update the OE development environment to the latest version first)

1 Reference code and precautions

1.1 Verifying the correctness of inference results of onnx model

The reason why we recommend using HB_ONNXRuntime instead of the public onnxruntime is that the implementation of some operators of the public onnxruntime is different from the original training framework, which may lead to inconsistent model reasoning results. **1. Verify the correctness of the original floating point onnx model **

Specifically refers to onnx models exported from the DL framework

from horizon_tc_ui import HB_ONNXRuntime
import numpy as np
import cv2

def preprocess(input_name):
    # BGR->RGB、Resize、CenterCrop···      
    # HWC->CHW      
    # normalization      
    return data

def postprocess(model_output):
    # 后处理

def main(): 
    sess = HB_ONNXRuntime(model_file=MODEL_PATH)
    input_names = [input.name for input in sess.get_inputs()]
    output_names = [output.name for output in sess.get_outputs()]
    feed_dict = dict()
    for input_name in input_names:
        feed_dict[input_name] = preprocess(input_name)
         
    outputs = sess.run_feature(output_names, feed_dict, input_offset=0)     
    
    postprocess(outputs)
        
if __name__ == '__main__':
    main()

** 2. Verify the correctness of the conversion tool output **

original_float.onnx, optimize_float.onnx, quantized.onnx

from horizon_tc_ui import HB_ONNXRuntime
import numpy as np
import cv2

def preprocess(input_name):
    # BGR->RGB、Resize、CenterCrop···      
    # HWC->CHW
    # input_type_train->input_type_rt*(
    # normalization
    #-128
    return data

def postprocess(model_output):


def main(): 
    sess = HB_ONNXRuntime(model_file=MODEL_PATH)
    input_names = [input.name for input in sess.get_inputs()]
    output_names = [output.name for output in sess.get_outputs()]
    feed_dict = dict()
    for input_name in input_names:
        feed_dict[input_name] = preprocess(input_name)  
    outputs = sess.run_feature(output_names, feed_dict, input_offset=0)     
    outputs = sess.hb_session.run(output_names, feed_dict)
    postprocess(outputs)
        
if __name__ == '__main__':
    main()

1.2 yaml Configuration File and pre-processing code description

Model conversion Completes the conversion of floating point model to horizon mixed heterogeneous model. In order to make the heterogeneous model run quickly and efficiently in the embedded system, model transformation focuses on the input data processing and model optimization and compilation. This section focuses on the internal logic of input data processing to facilitate understanding of the cooperative relationship between preprocessing nodes and model preprocessing.

1.2.1 Parsing of preprocessing nodes

Because Horizon’s edge AI computing platform will provide hardware-level support schemes for certain types of input paths, the outputs of these schemes will not necessarily meet the requirements of the model inputs. For example, there is a video processing subsystem in the video path, which provides image cropping, scaling and other image quality optimization functions for acquisition. The output of these subsystems is often yuv420 format images, and our algorithm model is usually trained based on bgr/rgb and other commonly used image formats. In order to reduce the workload of user board deployment, we solidified several common image format conversion and common image standardization operations into the model, which is shown as the model input node after the insertion of the preprocessing node HzPreprocess (you can use the open source tool Netron to observe the intermediate products in the conversion process). Due to the existence of HzPreprocess, the preprocessing operation of the converted model may be different from that of the original model, so we first have a detailed understanding of the insertion logic of the preprocessing node. When the mapper tool completes the conversion of caffe/onnx model, the caffe model is first parsed into onnx format. The HzPreprocess node is added to the model based on the configuration parameters (input_type_rt, input_type_train, and norm_type) in the yaml configuration file. The preprocessing node appears in all products produced during the transformation process. Ideally, this HzPreprocess node should complete the conversion from ‘input_type_rt’ to ‘input_type_train’. In reality, the entire type conversion process is completed with the Horizon AI chip hardware. The ONNX model does not include hardware conversion. Therefore, the real input type of ONNX will use an intermediate type, which is the result type of the hardware processing of ‘input_type_rt’, and the data layout(NCHW/NHWC) will remain the same as the input layout of the original floating point model. Each ‘input_type_rt’ has a specific corresponding intermediate type, as shown in the following table:

input_type_rt

nv12

yuv444

rgb

bgr

gray

featuremap

The middle format is

yuv444_128

yuv444_128

RGB_128

BGR_128

GRAY_128

featuremap

The first row in the table is the data type specified by input_type_rt, and the second row is the intermediate type corresponding to the specific input_type_rt, which is the input type of the three onnx models that are the model conversion intermediates. Each type is explained as follows: -yuv444_128 /RGB_128/BGR_128/GRAY_128 indicates the result of subtracting 128 from the corresponding input_type_rt. -featuremap is a four-dimensional tensor data. J5 supports non-four-dimensional tensor data. Each value is represented in float32.

To avoid misuse, not all combinations of input_type_rt and input_type_train are supported. According to the actual generation experience, the current open combinations are as follows:

nv12

yuv444

rgb

bgr

gray

featuremap

yuv444

Y

Y

N

N

N

N

rgb

Y

Y

Y

Y

N

N

bgr

Y

Y

Y

Y

N

N

gray

N

N

N

N

Y

N

featuremap

N

N

N

N

N

Y

In addition, if the norm_type parameter in the yaml file is configured as data_mean, data_scale, data_mean_and_scale, the preprocessing node will also contain norm operations.

Please note: mean and scale parameters in yaml file need to be converted to mean and std during training.

The calculation formula in the HzPreprocess node is: norm_data = (data-mean)*scale. Taking yolov3 as an example, its pre-processing code during training is as follows:

[Sample code for preprocessing]

1.2.2 Pre-processing node and pre-processing code

Due to the existence of HzPreprocess node, the pre-processing of the model generated by transformation will be different from the original model. In general, there are two things to note:

  • Intermediate model of inference transformation (original_float_model.onnx/optimized_float_model.onnx/quantized_model.onnx), The input data needs to be processed to the intermediate type of input_type_rt during preprocessing (the operation of -128 can be achieved by configuring the input_offset parameter of the onnx model inference API. The application of this parameter can refer to any conversion example in the distribution package or 1.1 above to verify the correctness of the floating point onnx model);
  • Do not repeat the norm operation with HzPreprocess.

Examples of model preprocessing at each stage are shown in the following figure:

[Precautions for preprocessing]

The calibration data only needs to be processed to input_type_train, and be careful not to repeat the norm operation.

1.3 Verifying consistency between python and board

1.3.1 hb_model_verifier Tool

We have upgraded this tool, and the new tool is the hb_verifier tool (supported by J5 OE1.1.40/XJ3 OE2.5.2 and above), we recommend that you use the new tool first, and the current tool will be deprecated in future versions.

The hb_model_verifier tool is a tool that validates results against specified fixed-point onnx models and bin models. The tool uses the specified image (if no image is specified, the tool uses the default image for inference, and the featuremap model uses randomly generated tensor data), performs the fixed-point model inference, the inference on the bin model board and the x86 simulator, and compares the results of the three sides to determine whether they pass.

bin model Inference on the board Ensure that the given ip address can be pinged through and hrt_tools has been installed on the board. If not, use the install.sh script under ddk/package/board in the OE package to install the HRT_Tools.

1.3.1.1 Tool Introduction

1. Parameter Description

hb_model_verifier -q ${quanti_model} \
                  -b ${bin_model} \
                  -a ${board_ip} \
                  -i ${input_img} \
                  -d ${digits}

--quanti_model, -q Fixed-point model name. --bin_model, -b bin Model name. --arm-board-ip, -a arm board ip address used in the upper board test. --input-img, -i Pictures used when reasoning tests. If not specified, the default image or random tensor is used. For binary image files, the file name extension is.bin. --compare_digits, -d Compares the numerical precision of inference results. If not specified, the comparison defaults to five decimal places. **2. Output content analysis ** The comparison of results will finally be displayed in the terminal. The tool will compare the running results of the ONNX model, the running results of the simulator and the results of the upper board. If there is no problem, it should be displayed as follows:

Quanti onnx and Arm result Strict check PASSED

When the accuracy of the fixed-point model and runtime model is inconsistent, the inconsistent result message is displayed and the message “check FAILED” is displayed.

1.3.1.2 Example

hb_model_verifier -q quanti.onnx -b model.bin -a 10.10.10.10
hb_model_verifier -q quanti.onnx -b model.bin -a 10.10.10.10 -i data.bin

hb_model_verifier Currently supports only single-input models. If the model has multiple outputs, only the results of the first output are compared. Validation of the packaged *.bin model is not currently supported

1.3.2 hb_verifier Tool

The hb_verifier tool is a tool used to validate results against specified fixed-point onnx models and bin models. The tool uses the specified image (if no image is specified, the tool uses the default image for inference, and the featuremap model uses randomly generated tensor data), performs the fixed-point model inference, the inference on the bin model board and the x86 simulator, and compares the results of the three sides to determine whether they pass. The tool also supports comparison between a.bin model with Dequantize nodes removed and a fixed-point onnx model.

1.3.2.1 Tool Introduction

1. Parameter Description

hb_verifier -m   ${quanti_model},${bin_model} \
            -b   ${board_ip} \
            -s   True / False \
            -i   ${input_img} \
            -c   ${digits}  \
            -r   True / False

--model/-m The fixed point model name and bin model name are distinguished by a ", "between multiple models.

--board-ip/-b arm board ip address used in the upper board test.

--run-sim/-s Set whether to use X86 libdnn for bin model inference. The default value is False.

  • When this parameter is set to True, the tool will use the x86 environment libdnn for bin model inference.
  • When this parameter is set to False, the tool will not use the x86 environment libdnn for bin model inference.

--input-img/-i Specify the picture to use when inference testing. If not specified, randomly generated tensor data will be used. If you specify a binary image file, the file name extension must be.bin. There are two ways to add pictures to the multi-input model: multiple pictures are divided by ", ":

  • input_name1:image1,input_name2:image2, …
  • image1,image2, …

--compare_digits/-c Sets the numerical accuracy of the comparison inference result (that is, the number of decimal places to compare the value). If not specified, the tool defaults to five decimal places.

--dump-all-nodes-results/-r Sets whether to save the output of each operator in the model and compare the output of the operator with the same name. The default is False.

  • When this parameter is set to True, the tool will take the output of all nodes in the model and match the output names of the nodes for comparison. For performance reasons, the dump function is not supported on the X86 environment.
  • When this parameter is set to False, the tool will only take the final output of the model and compare it.

2. Parse the output The result comparison will finally be displayed in the terminal. The tool will compare the operation results of multiple models in different scenarios. If there is no problem, it should be displayed as follows:

Quanti.onnx and Arm result Strict check PASSED

When the accuracy of the fixed-point model and runtime model is inconsistent, the inconsistent result message is displayed and the message “check FAILED” is displayed.

1.3.2.2 Example

1. Comparison of quanti.onnx model inference,.bin model board side inference,.bin model x86 side inference:

> hb_verifier -m quanti.onnx,model.bin -b *.*.*.* -s True

2. Comparison of inference results of quanti.onnx model and.BIN model:

hb_verifier -m quanti.onnx,model.bin -b *.*.*.*

3. Comparison of inference results of quanti.onnx model and.bin model on X86:

hb_verifier -m quanti.onnx,model.bin -s True

4. Comparison of inference results of bin model at board end and end end:

hb_verifier -m model.bin -b *.*.*.* -s True

5. Save the output of each operator during quanti.onnx model inference and.bin model board end inference, and compare the results of operator output with the same name:

hb_verifier -m quanti.onnx,model.bin -b *.*.*.* -r True

1.3.3 hrt_model_exec infer Tool

1.3.3.1

The hrt_model_exec infer command is used to infer a frame using user-defined input data. The user specifies the input data path through input_file. If it is a picture, the tool will resize the picture according to the model information and organize the model input information. This command also outputs the model run time of a single thread running a single frame.

Describes the optional

core_id

Specifies the kernel id of model inference. 0: any kernel, 1: core0, 2: core1; The default value is 0.

roi_infer

Enables resizer model inference. If the model input contains a resizer source, set this to true and default to false.

roi

roi_infer takes effect when true. Set the roi region required to infer the resizer model at semicolon intervals.

frame_count

Set infer the number of frames to run. Infer can be used with enable_dump to verify output consistency. The default value is 1.

dump_intermediate

dumps the input and output data of each layer of the model. The default value is 0. 1: The output file type is bin. 2: Output types are bin and txt, where BPU nodes output aligned data. 3: The output type is bin or txt, and the BPU node outputs valid data.

enable_dumpdump

Indicates the model output. The default value is false. dump_precision Controls the decimal number of float data output in txt format. The default value is 9.

hybrid_dequantize_process

controls the output of float type data in txt format. If the output is fixed point data, it will be inverse quantized. Currently, only four-dimensional model is supported.

dump_format

dump type of the model output file. The value can be bin or txt. The default value is bin.

dump_txt_axis

dump line wrapping rules output in the txt format of the model. If the output dimension is n, the parameter range is [0, n], and the default is 4.

enable_cls_post_process

Enables post-classification processing. Currently, only ptq classification model is supported. The default value is false.

1.3.3.2 Example

**1. Normal model **

hrt_model_exec infer --model_file=xxx.bin --input_file=xxx.jpg --enable_dump 1 --dump_format txt

**2. resizer model (supported after J5 OE1.1.29, supported after XJ3 OE2.4.2) **

/hrt_model_exec infer --model_file=xxx.bin --input_file= xxx.jpg --roi="2,4,123,125" --roi_infer=true --enable_dump 1 --dump_format txt

**3. Remove the inverse quantization node model, still output the inverse quantization floating-point result (J5 OE1.1.37 can be supported, XJ3 OE2.5.2 can be supported) **

hrt_model_exec infer --model_file=xxx.bin --input_file=xxx.jpg --hybrid_dequantize_process 1 --enable_dump 1  --dump_format txt

1.4 Precision Tuning Suggestion

After a large number of practical production experience has proved that if the optimal quantization parameter combination can be selected, Horizon’s conversion tool can keep the accuracy loss within 1% in most cases. According to the accuracy loss, the following suggestions can be used to solve the problem:

1.4.1 Obvious loss of accuracy (more than 4%)

If the model accuracy loss is greater than 4%, it is usually caused by improper yaml configuration and unbalanced verification data set, etc. It is recommended to check from three aspects: pipeline, model transformation configuration, and consistency check. **1. pipeline check ** pipeline refers to the whole process in which users complete data preparation, model conversion, model reasoning, post-processing and precision measurement Metric. The PTQ accuracy evaluation and consistency verification recommendation process introduced above can help you identify the stage in which accuracy problems occur, thus narrowing the scope of investigation.

2. Model transformation yaml configuration check According to PTQ accuracy evaluation and consistency verification recommendation process, when the accuracy problem occurs in original_float.onnx, it is recommended to focus on checking whether the yaml configuration file and pre - and post-processing code are wrong. Among them, there are two common errors in yaml file. The two parameters - input_type_rt and input_type_train are used to distinguish the data format required by the converted mixed heterogeneous model and the original floating point model. It is necessary to carefully check whether they meet the expectations, especially whether the BGR and RGB channel sequences are correct.

  • norm_type, mean_values, and scale_values are correctly configured. By configuring these three parameters, pre-processing nodes can be directly inserted into the model to achieve the mean and scale functions. It is necessary to confirm whether repeated mean and scale operations have been carried out on the calibration/test image. According to the support experience, repeated pre-processing is an error prone area.

3. Data processing consistency check This part of the check is mainly for users who prepare calibration data and evaluation code by referring to the OE development kit example, and there are mainly the following common errors:

  • If read_mode is incorrectly specified, you can use the --read_mode parameter in 02_preprocess.sh to specify the image reading mode. opencv and skimage are supported. In addition, the image reading mode is also set by imread_mode parameter in preprocess.py, which also needs to be modified. Using skimage image read, the obtained is RGB channel sequence, the value range is 0~1, the value type is float; With opencv, the result is a BGR channel sequence with values ranging from 0\ to 255 and a uint8 data type.
  • The storage format of the calibration data set is not set correctly: At present, we use numpy.tofile to save the calibration data, which does not save shape and type information. If the input_type_train is in non-Featuremap format, The yaml parameter cal_data_type is used to set the data storage type of the binary file. For versions earlier than J5-OE1.1.16 and XJ3-OE1.13.3, the data dtype is determined by whether the data store path contains f32. If the f32 keyword is contained, the data is resolved by float32. Instead it uses uint8 to parse the data.
  • transformer implementation is inconsistent: Horizon provides a series of common function of pretreatment and stored in/horizon_model_convert_sample / 01 _common/python/data/transformer. Py file, part of the way of preprocessing operations may differ, For example, ResizeTransformer, we use the opencv default interpolation method (linear), if other interpolation methods can directly modify the transformer.py source code to ensure that it is consistent with the pre-processed code during training.

< font color = blue > < br > 1.4.2 small precision loss (1.5% 3%) < / font >

To reduce the difficulty of model precision tuning, we recommend that you first try to configure calibration_type to default. default is an automatic search function. Based on the cosine similarity of the output node of the first calibration data, the optimal scheme is selected from max, max-Percentile 0.99995, KL and other calibration methods. The final calibration method selected can be followed by a prompt like “Select kl method.” in the conversion log. During the search process, solutions such as perchannel quantization and Asymmetric quantization are enabled. If per-channel is enabled, the following information is displayed: Perchannel quantization is enabled. If asymmetry is enabled, the following information is displayed: Asymmetric quantization is enabled. If the accuracy of the automatic search results is still not as expected, try the following suggestions for tuning:

1. Adjust the calibration mode

  • Manually specify calibration_type and select mix. (In mix calibration, kl calibration mode will first be used to quantify the model, nodes with cosine similarity less than 0.999 will be taken as sensitive nodes, and then max and max0.99995 will be used to calibrate these nodes, and the calibration mode with the best cosine similarity will be used to obtain the mixed calibration model.)
  • Set calibration_type to max and max_percentile to a different quantile (the value ranges from 0 to 1). We recommend that you first try 0.99999, 0.99995, 0.9999, 0.9995, 0.999, through these five configurations to observe the change trend of model accuracy, and finally find an optimal quantile;
  • Select the scheme with the highest cosine similarity based on the previous attempts and try to enable per_channel.
  • Starting with J5 OE1.1.62, Optimization parameters in yaml provides asymmetric and [bias_correction] (https://developer.horizon.cc/forumDetail/177840463137677363) option The results show that these two parameters can improve the quantization accuracy in some scenarios.

2. Calibrate calibration data set

  • You can try to increase or decrease the amount of data appropriately (usually less calibration data is required for detection scenarios than for classification scenarios)
  • Observe the missing detection of the model output, and appropriately increase the calibration data of the corresponding scene;
  • Do not use abnormal data such as pure black and white, and minimize the use of untargeted background images as calibration data; Typical task scenarios are covered as comprehensively as possible, so that the distribution of the calibration data set is similar to the training set.

3. Back part of the tail operator to CPU high-precision calculation

  • Generally, we only try to return 1-2 operators of the output layer at the tail of the model to the CPU, too many operators will greatly affect the final performance of the model, and the judgment can be made by observing the cosine similarity of the model; (If some intermediate nodes are run_on_cpu, the accuracy is not improved, which is normal, because repeated requantization may also bring greater accuracy loss, so it is usually only recommended to roll the tail node back to the cpu)
  • To specify the operator running on the CPU, use the run_on_cpu or node_info parameter in the yaml file;
  • If an error occurs after the run_on_cpu model compilation, contact Horizon technical support

1.4.3 Precision debug Tool

In the post-quantization process of PTQ model, there are two main reasons for precision loss: sensitive node quantization problem and node quantization error accumulation problem. For both cases, the Horizon Toolchain provides precision Debug tools to assist users in autonomously locating accuracy issues arising during model quantization. The precision Debug tool analyzes the quantization error of node granularity in a calibration model and quickly locate nodes with abnormal accuracy. It provides the following functions: Obtaining the quantization sensitivity of nodes, the cumulative error curve of the model, the data distribution of a specified node, and the box diagram of data distribution between input data channels of a specified node. For details, see the community manual.