1 Preface
This article focuses on the use and deployment of the resizer model on the Horizon development board. If you are a beginner, you may have the following questions about this content:
- What is resizer? What is the resizer model?
- Why use the resizer model?
- How to get resizer model?
- How is the resizer model deployed?
Below I will use specific examples to explain the above several issues. Answer 1: resizer refers to a function module on the chip used to process image scaling, which can scale a specified ROI in the image (nv12) to a specified size. The scaling method is bilinear interpolation, driven by functioncall of the BPU. The algorithm model that needs to be used in some artificial intelligence business scenarios is called resizer model. Answer 2: In order to facilitate everyone’s understanding, the vehicle detection and identification scene ** is taken as an example **, as shown in the following figure. Firstly, the location information of different vehicles in the picture is detected through an upstream network. Before the detected vehicle picture information is sent to the small network for identification, the size scaling is required. In order to adapt to similar business scenarios and speed up the size scaling speed of the picture, the Horizon chip provides a special acceleration hardware module for the resizer model.
Answer 3: The resizer model is generated by setting the compile parameter input_source to {‘input_name’: ‘resizer’} when the model is converted. Note: If this parameter is resizer, inpurt_type_rt can only be configured as nv12 or gray. A4: After the resizer model has been compiled, the use and deployment details are covered in Section 3.
** Note: ** The directory structure of the user manual directory and open_explorer development kit is explained by using XJ3 as an example. J5 users are advised to practice in similar directories.
2 resizer model constraints
For resizer model inference, the input is read from DDR, the roi is first used to matting from the original image and then resize to the model input size. The output is stored in a special on-chip memory, which can only be read by BPU, and the reading process will change the data arrangement, so the output of resizer can only be sent to BPU for model inference. In the above process, some constraint information is shown in the following table.
**XJ3 chip Platform: **
The zoom multiples in both the H and W directions are limited. The following src_len indicates ROI.siz. h or ROI.siz. w. dst_len indicates the output h or w. step = ((src_len - 1) * 65536 + (dst_len - 1) / 2) / (dst_len - 1) step needs to be in the range [0, 262143], that is, the scaling ratio is within [1/4, 65536). If there is a keep ratio option in the relevant interface and this parameter is set to true, the limit needs to consider the step limit in terms of the output size after the keep ratio. h_ratio = roi.size.h * 65536 / dest_h w_ratio = roi.size.w * 65536 / dest_w If h_ratio>w_ratio, dest_w in the preceding formula must be adjusted to roi.size.w * dest_h / roi.size.h If w_ratio>h_ratio, dest_h in the preceding formula must be adjusted to roi.size.h * dest_w / roi.size.w
resizer model currently supports multi-input nv12 data, and resizer’s commonly used output sizes (HxW) are as follows:
- 128x128
- 128x64
- 64x128
- 160x96
3 resizer model use flow
Using the roi_infer example in the ‘horizon_xj3_open_explorer_v2.6.2b-py38_20230606’ development package (OE package for short), The path is’ ddk/samples/ai_toolchain/horizon_runtime_sample/code/01_api_tutorial/roi_infer ‘for a brief explanation. The resizer model used is’ mobilenetv1_128x128_resizer_nv12.bin ‘, The path is’ /open_explorer/ddk/samples/ai_toolchain/model_zoo/runtime/mobilenetv1 '.
3.1 Description of the required file structure
The resizer model example is in the horizon_runtime_sample directory. The required directory structure is as follows:
├── code # Compiled sample source code in the development machine │ ├── 01_api_tutorial # dnn API uses sample code │ │ └──roi_infer # resizer model inference example │ ├── build_xj3.sh # xj3 ARM-side compilation script │ ├── CMakeLists.txt │ ├── deps │ │ └──aarch64 # xj3 ARM-side compilation dependency library ├── xj3 # On-board running directory structure │ ├── data # Initialize the image data file │ ├── model # resizer model file │ │ └──model_name.bin # mobilenetv1_128x128_resizer_nv12.bin │ ├── script # ARM side example running script │ ├── 01_api_tutorial # dnn API uses sample code │ └──roi_infer.sh # resizer Runs the script on the board └── README.md
When you first use, focus on the ‘code/build_xj3.sh’ and ‘xj3/01_api_tutorial/roi_infer.sh’ two scripts.
3.2 Environmental preparation
To complete the use and deployment of the entire resizer model, the development machine and the development board are needed, so the environment preparation includes two parts: the deployment of the development machine and the deployment of the development board. Horizon provides the docker environment deployment script run_docker.sh in the development machine and the development board environment deployment script install.sh.
3.3 Main Process of on-board deployment The main process of resizer model on-board deployment is as shown in the figure below. First, load the resizer mixed heterogeneous model (model_name.bin) through the API provided in the prediction library, obtain the input and output information of the model, and prepare the memory for input and output data. The hbDNNRoiInfer interface is used to deduce the model, and the inferred results are post-processed. Of course, after these operations are completed, related resources need to be released.
In the whole process, different from conventional model deployment, the API hbDNNRoiInfer was used to infer the model. The parameters and return values of the API were explained in detail below.
hbDNNRoiInfer()
int32_t hbDNNRoiInfer(hbDNNTaskHandle_t *taskHandle,
hbDNNTensor **output,
const hbDNNTensor *input,
hbDNNRoi *rois,
int32_t roiCount,
hbDNNHandle_t dnnHandle,
hbDNNInferCtrlParam *inferCtrlParam);
Parameter description:
[out] taskHandle: indicates the task handle pointer. [in/out] output: indicates the output of the inference task. [in] input: input to the inference task. [in] rois: Roi box information. [in] roiCount: indicates the number of Roi boxes. [in] dnnHandle: indicates a dnn handle pointer. [in] inferCtrlParam: Parameter that controls the inference task.
Return value:
If 0 is returned, the API is successfully executed; otherwise, the execution fails.
There are four parameters that need to be noted, namely output, input, rois and roiCount. In order to facilitate your understanding, two examples are given below.
** Example 1** : The downstream network is a single-input, single-output, single-batch model ** (model_batch=1, that is, batch_size=1 input to the model during the actual inference of the model), and the input source is resizer (set the input_source parameter to resizer in the conversion yaml). This is the case of the example provided in the OE package. As shown in the figure below, the sensor collects a picture, and after passing through the upstream network, two roi regions (roi0, roi1) are obtained, that is, the number of data batches data_batch=2 that the model needs to reason. After processing by the resizer module, it is sent to the downstream network. The downstream network is a single-input network, the input is: input0(128x64), and the downstream network has an output named: output0.
In this case, the input of the inference task can be expressed as:
[roi0_input0, roi1_input0]
output0
[Output0]
Output0 contains: roi0_output0, roi1_output0, which can be understood as the model inference of data_batch=2 and model_batch=1.
** Note ** : The input data in the hbDNNRoiInfer function is always the input map of the upstream network.
** Example 2** : When the downstream network is a single batch model with multiple inputs and multiple outputs (model_batch=1) and each input source is resizer, as shown in the following figure, the sensor collects one image, and after passing through the upstream network, two roi regions (roi0, roi1) are obtained. After processing by the resizer module, The downstream network has three inputs: input0(128x128), input1(128x64), and input2(64x128). The downstream network has two outputs, named output0 and output1 respectively.
input:
[roi0_input0, roi1_input1, roi2_input2, roi3_input0, roi4_input1, roi5_input2]
[output0, output1]
The contents of Output0 include data_batch0_output0 and data_batch1_output0. The contents of Output1 are as follows: data_batch0_output1, data_batch1_output1, therefore, when preparing the output memory need to combine the number of data_batch and the shape of the corresponding output branch to allocate.
** Example 3** : The Horizon Toolchain supports model_batch_resizer scenes from OE1.1.68 onwards, which has a certain acceleration effect for multi-data_batch scenes. An example is given to introduce it. Assuming the model has 3 input branches (2 resizer input sources, 1 ddr input source) and 1 output branch and is compiled with model_batch=2, the model needs to process 3 batches of data for a total of 6 ROIs (i.e. data_batch=3, 2 ROIs per batch of data).
At this point, the model deduces that the number of input_tensor required to prepare independent addresses for these 3 batches of data is 3 input branches x 3 batches of data = 9.
Assume that the static information of the model input/output is as follows:
- Model input (model_info) :
- tensor_0_resizer: [2, 3, 128, 128]
- tensor_1_resizer: [2, 3, 256, 256]
- tensor_2_ddr: [2, 80, 1, 100]
- Model output (model_info) :
- tensor_out: [2, 100, 1, 56]
Then the dynamic information of the model during inference is:
- Model input (input_tensors) :
- [1x3x128x128, 1x3x256x256, 1x80x1x100, 1x3x128x128, 1x3x256x256, 1x80x1x100, 1x3x128x128, 1x3x256x256, 1x80x1x100]
- Model output (output_tensors) :
- [4x100x1x56]
Because model_batch = 2, the underlying BPU can process 2 batches of data in a single execution. Because data_batch = 3, the formula for calculating the highest dimension of output_tensor is’ ceil[(data_batch)/model_batch] * model_batch ', which can be seen as an integer multiple of model_batch. This is also the BPU hardware instruction requirement, missing input will automatically ignore the calculation. Note: Here the model output [0~2x100x1x56] is valid data, and the last group is invalid data.
3.4 hbDNNRoiInfer call source code analysis In the file ‘code/01_api_tutorial/roi_infer/src/roi_infer.cc’ there is a detailed code of the whole process using the ‘hbDNNRoiInfer’ inference model. Here is a brief presentation of part of the code of the API call.
// load model
...
// Step1: get model handle
...
// Step2: set input data to nv12
// In the sample, since the input is a same image, can allocate a memory for
// reusing. image_mem is to save image data.
hbSysMem image_mem;
// image input size
int input_h = 0;
int input_w = 0;
{
// read a single picture, for multi_input model, you
// should set other input data according to model input properties.
read_image_2_nv12(FLAGS_image_file, &image_mem, &input_h, &input_w);
}
std::vector<hbDNNTensor> input_tensors;
std::vector<hbDNNTensor> output_tensors;
int input_count = 0;
int output_count = 0;
int data_batch = 2;
// Step3: prepare input and output tensor
{
// prepare input tensor
hbDNNGetInputCount(&input_count, dnn_handle);
input_tensors.resize(input_count * data_batch);
prepare_input_tensor(image_mem,
input_h,
input_w,
data_batch,
dnn_handle,
input_tensors.data());
// prepare output tensor
hbDNNGetOutputCount(&output_count, dnn_handle);
output_tensors.resize(output_count);
prepare_output_tensor(data_batch, dnn_handle, output_tensors.data());
// Step4: prepare roi info
/**
* For this model, there is only one input for the resizer input source;
* Suppose to infer 2 batches of data, the number of ROIs to be prepared is
* also 2.
*/
/**
* For a model with `resizer_count` resizer input sources, the number of ROIs
* that need to be prepared when inferring `batch` data is `resizer_count`*`data_batch`.
*/
std::vector<hbDNNRoi> rois;
hbDNNRoi roi_2 = {18, 24, 253, 251};
rois.push_back(roi_1);
rois.push_back(roi_2);
int roi_num = rois.size();
{
hbDNNRoiInfer(&task_handle,
&output,
input_tensors.data(),
rois.data(),
roi_num,
dnn_handle,
&infer_ctrl_param);
hbDNNWaitTaskDone(task_handle, 0);
}
...
...
int prepare_input_tensor(hbSysMem image_mem,
int input_h,
int input_w,
int data_batch,
hbDNNHandle_t dnn_handle,
hbDNNTensor *input_tensor) {
int input_count = 0;
hbDNNGetInputCount(&input_count, dnn_handle);
hbDNNTensor *input = input_tensor;
for (int batch_id = 0; batch_id < data_batch; batch_id++) {
for (int i = 0; i < input_count; i++) {
int tensor_id = batch_id * input_count + i;
hbDNNGetInputTensorProperties(
&input[tensor_id].properties, dnn_handle, i);
/** Tips:
* In the sample, all batches use the same image, so allocate memory to
* save image. all input tensor can reuse the memory. if your model has
* different input, please allocate memory for all input.
* */
input[tensor_id].sysMem[0] = image_mem;
/** Tips:
* resizer model should modify input validshape to input image shape.
* */
input[tensor_id].properties.validShape.dimensionSize[2] = input_h;
input[tensor_id].properties.validShape.dimensionSize[3] = input_w;
/** Tips:
* For input tensor, aligned shape should always be equal to the real
* shape of the user's data. If you are going to set your input data with
* padding, this step is not necessary.
* */
input[tensor_id].properties.alignedShape =
input[tensor_id].properties.validShape;
}
}
return 0;
}
int prepare_output_tensor(int data_batch,
hbDNNHandle_t dnn_handle,
hbDNNTensor *output_tensor) {
int output_count = 0;
hbDNNGetOutputCount(&output_count, dnn_handle);
hbDNNTensor *output = output_tensor;
/**
* Multi outputs resizer model, outputs num is model outputs num
* because every output tensor contains all data_batch outputs, assign data_batch num is
* 2, and take output[0] for example: outputs[0] contains: data_batch0_output0,
* data_batch1_output0
* */
for (int i = 0; i < output_count; i++) {
hbDNNGetOutputTensorProperties(&output[i].properties, dnn_handle, i);
int output_memSize = output[i].properties.alignedByteSize * data_batch;
hbSysAllocCachedMem(&output[i].sysMem[0], output_memSize);
}
return 0;
}
3.5 Project compilation
The reference process for engineering compilation is as follows: 1. Execute the ‘build_xj3.sh’ script in the ‘horizon_runtime_sample/code’ directory to compile and generate the executable programs and corresponding dependent libraries on the xj3 development board with one click. Stored in ‘bin’ and ‘lib’ subdirectories under ‘xj3/script/aarch64’ respectively;
sh build_xj3.sh
2. 正确完成编译后,xj3/script/aarch64的目录结构为:
aarch64 ├── bin │ └── roi_infer └── lib ├── libdnn.so ├── libhbrt_bernoulli_aarch64.so └── libopencv_world.so.3.4
3. Transfer the files related to ‘roi_infer’ in the xj3 folder to the x3 development board. The list of transferred files is as follows:
├── xj3 │ ├── data │ ├── model │ │ └──model_name.bin │ ├── script │ ├── aarch64 │ ├── 01_api_tutorial │ └──roi_infer.sh
scp -r xj3/ root@board_ip:/userdata/
3.6 Upper board reasoning
After compiling correctly, enter the ‘/userdata/xj3/script/01_api_tutorial/’ directory on the development board and execute the 'roi_infer.
sh roi_infer.sh
results:
I0216 01:20:52.592705 32259 roi_infer.cc:178] read image to nv12 success
I0216 01:20:52.592958 32259 roi_infer.cc:199] prepare input tensor success
I0216 01:20:52.593054 32259 roi_infer.cc:208] prepare output tensor success
I0216 01:20:52.597029 32259 roi_infer.cc:469] batch[0]: TOP 0 result id: 341
I0216 01:20:52.597089 32259 roi_infer.cc:469] batch[0]: TOP 1 result id: 283
I0216 01:20:52.597112 32259 roi_infer.cc:469] batch[0]: TOP 2 result id: 293
I0216 01:20:52.597136 32259 roi_infer.cc:469] batch[0]: TOP 3 result id: 397
I0216 01:20:52.597159 32259 roi_infer.cc:469] batch[0]: TOP 4 result id: 83
I0216 01:20:52.597311 32259 roi_infer.cc:469] batch[1]: TOP 0 result id: 341
I0216 01:20:52.597337 32259 roi_infer.cc:469] batch[1]: TOP 1 result id: 293
I0216 01:20:52.597360 32259 roi_infer.cc:469] batch[1]: TOP 2 result id: 283
I0216 01:20:52.597383 32259 roi_infer.cc:469] batch[1]: TOP 3 result id: 397
I0216 01:20:52.597406 32259 roi_infer.cc:469] batch[1]: TOP 4 result id: 324
At this point, the whole process of resizer model inference and result output is completed.
4 board end use tool to evaluate resizer model
4.1 Introduction to the hrt_model_exec tool hrt_model_exec is a model execution tool, which provides three functions: model reasoning ‘infer’, model performance analysis’ perf ‘and view model information’ model_info '. Specific information about the tool can be found in Horizon [Toolchain Manual 5.5.2. Hrt_model_exec tool introduction] (HTTP: / / https://developer.horizon.ai/api/v1/fileData/horizon_xj3_open_explorer_cn_doc/runtime/source/tool_in troduction/source/hrt_model_exec.html).
Starting from version 1.13.1, the hrt_model_exec tool adds two parameters, roi_infer and roi, to support resizer model inference and performance evaluation. The input of roi_infer is of type bool, which enables resizer model inference. The roi input is of type string and is used to specify the roi region required when inferring resizer models. On the board, the tool version can be viewed with the ‘hrt_model_exec -v’ command, and the tool usage details can be viewed with the ‘hrt_model_exec -h’, which is described in detail below. When the model includes the resizer input source, the ‘infer’ and ‘perf’ functions need to set ‘roi_infer’ to true and configure the ‘input_file’ and ‘roi’ parameters that correspond one-to-one to the input source. For example, if a model has three inputs and the input source order is’ [ddr, resizer, resizer] ', then the command to evaluate ** two sets of ** input data is as follows:
# infer
hrt_model_exec infer --roi_infer=true --model_file=xxx.bin --input_file="xx0.bin,xx1.jpg,xx2.jpg,xx3.bin,xx4.jpg,xx5.jpg" --roi="2,4,123,125;6,8,111,113;27,46,143,195;16,28,131,183"
# perf
hrt_model_exec perf --roi_infer=true --model_file=xxx.bin --input_file="xx0.bin,xx1.jpg,xx2.jpg,xx3.bin,xx4.jpg,xx5.jpg" --roi="2,4,123,125;6,8,111,113;27,46,143,195;16,28,131,183"
** Note ** : input_file is separated by commas and cannot contain Spaces. Each roi input is separated by an English semicolon.
** For resizer model scenarios ** where model_batch is greater than 1, note the configuration of model_file and input_file parameters. For example, if a model has an input, the input source is resizer, and model_batch is 2, the command to reason data_batch=1 data is as follows:
hrt_model_exec infer --model_file xxx_batch2_resizer.bin --input_file="xxx.jpg" --core_id=0 --roi="6,12,253,253" --roi_infer=true
In this case, the first dimension of the output is valid data, and the second dimension is invalid data
The command to reason data_batch=2 data is as follows:
hrt_model_exec infer --model_file mobilenetv1_224x224_nv12_batch2_resizer.bin --input_file="xxx.jpg,xxx.jpg" --core_id=0 --roi="6,12,253,253;27,46,143,195" --roi_infer=true
4.2 Board end measurement
Taking the ‘mobilenetv1_128x128_resizer_nv12.bin’ model in Section 3 as an example, two functions of the ‘hrt_model_exec’ tool ‘model inference’ and model performance analysis’ perf 'are used. Show the process of evaluating resizer models with the hrt_model_exec tool. The files required are:
├──j5
│ ├── zebra_cls.jpg
│ ├── hrt_model_exec
│ └── mobilenetv1_128x128_resizer_nv12.bin
-
infer model commands:
./hrt_model_exec infer --model_file=mobilenetv1_128x128_resizer_nv12.bin --input_file=“zebra_cls.jpg” --core_id=0 --roi=“2,4,123,125” --roi_infer=true
results:
I1116 14:49:44.500945 4860 main.cpp:1199] infer success
I1116 14:49:44.503501 4860 main.cpp:1205] task done
---------------------Frame 0 begin---------------------
Infer time: 4.103 ms
---------------------Frame 0 end---------------------
-
perf latency
./hrt_model_exec perf --model_file=mobilenetv1_128x128_resizer_nv12.bin --input_file=“zebra_cls.jpg” --core_id=1 --thread_num=1 --frame_count=1000 --roi=“2,4,123,125” --roi_infer=true
Running condition:
Thread number is: 1
Frame count is: 1000
Program run time: 1936.663000 ms
Perf result:
Frame totally latency is: 1839.814087 ms
Average latency is: 1.839814 ms
Frame rate is: 516.352096 FPS -
perf FPS
./hrt_model_exec perf --model_file=mobilenetv1_128x128_resizer_nv12.bin --input_file=“zebra_cls.jpg” --core_id=0 --thread_num=5 --frame_count=1000 --roi=“2,4,123,125” --roi_infer=true
Running condition:
Thread number is: 5
Frame count is: 1000
Program run time: 526.053000 ms
Perf result:
Frame totally latency is: 2452.402344 ms
Average latency is: 2.452402 ms
Frame rate is: 1900.949144 FPS