[Sunrise X3] bup_resize and simplified C++ compilation and deployment process

D-Robotics · December 3, 2022, 6:59am

1. Preface

In the x3 developer manual, the use of bpu for resize operation was tested on the board end, comparing the time difference between BPU-resize and opencv-resize, and it can also scale the clipped area at the same time. In addition, it has been compiled under docker environment before, which is a little troublesome, while cpp compilation only relies on cross-compilation tools and dependent files, and cross-compilation tools under docker /opt/gcc-ubuntu-9.3.0-2020.03-x86_64-aarch64-linux-gnu. Copy it to the host, you can compile it without relying on docker, if there is a problem compiling, you can also get the complete dependency file and source code from the Baidu cloud link in this article. This article tests the code :https://github.com/Rex-LK/ai_arm_learning Baidu cloud depends on complete files and source code: https://pan.baidu.com/s/1x34kaRseh8YXMJ-vmFhxEg?pwd=c6cn extraction code: c6cn

2, simplify the cpp compilation environment

export the path of the cross-compilation tool locally

export LD_LIBRARY_PATH=.... /ai_arm_learning/x3/ datagcc-ubuntu-9.3.0-2020.03-x86_64-aarch64-linux-gnu /lib/x86_64-linux-gnu ##.... For actual path

If you do not use the cross-compilation tool, i.e. comment out # SET(tar x3), it will appear when the host is running yolo_demo

./yolo_demo: error while loading shared libraries: libhbdk_sim_x86.so: cannot open shared object file:  No such file or directory

The same library path corresponding to export is sufficient

export LD_LIBRARY_PATH=.... /ai_arm_learning/x3/data/deps/x86/dnn_x86/lib   ## .... For actual path

With the above process, you can happily cross-compile without relying on docker, so let’s start the main topic of this article.

3. resize using bpu

Three kinds of resize methods of different image formats, namely yuv420, rgb and bgr, were implemented in x3_inference/sample/resize_demo.cpp, and the final api interface called was hbDNNResize

string image_path = ".. /.. /data/images/kite.jpg";
string test_case = argv[1];
int oimg_w = 1920;
int oimg_h = 1080;
auto bgrImg = imread(image_path);

int resized_w = 640;
int resized_h = 640;

resize(bgrImg,bgrImg,Size(oimg_w,oimg_h));

if(test_case == "YUV420"){
Mat yuvImg;
cvtColor(bgrImg, yuvImg, COLOR_BGR2YUV_I420);
BpuResize* resizer = new BpuResize(oimg_w,oimg_h,resized_w,resized_h,imageType::YUV420);
long t1 = tools::get_current_time();
float* res = resizer->Resize(yuvImg);
long t2 = tools::get_current_time();
cout <<"bpu resize:" <<t2 - t1 << endl;
Mat ResizedYuvMat(resized_h * 1.5, resized_w, CV_8UC1);
Memcpy (ResizedYuvMat. Data, res, resized_h resized_w * * 1.5);
Mat ResizedBgrMat;
cvtColor(ResizedYuvMat,ResizedBgrMat, COLOR_YUV2BGR_I420);
imwrite("test_resized_yuv.png", ResizedBgrMat);
}
else if (test_case == "BGR"){
BpuResize* resizer = new BpuResize(oimg_w,oimg_h,resized_w,resized_h,imageType::BGR);
long t1 = tools::get_current_time();
float* res = resizer->Resize(bgrImg);
long t2 = tools::get_current_time();
cout <<"bpu resize:" <<t2 - t1 << endl;
Mat ResizedBgrMat(resized_h , resized_w, CV_8UC3);
memcpy(ResizedBgrMat.data,res,resized_w * resized_h * 3);
imwrite("test_resized_bgr.png", ResizedBgrMat);
}
else if (test_case == "RGB"){
Mat rgbImg;
cvtColor(bgrImg, rgbImg, COLOR_BGR2RGB);
BpuResize* resizer = new BpuResize(oimg_w,oimg_h,resized_w,resized_h,imageType::RGB);
long t1 = tools::get_current_time();
float* res = resizer->Resize(rgbImg,{0,0,200,2000});
long t2 = tools::get_current_time();
cout <<"bpu resize:" <<t2 - t1 << endl;
Mat ResizedRgbMat(resized_h , resized_w, CV_8UC3);
memcpy(ResizedRgbMat.data,res,resized_w * resized_h * 3);
Mat ResizedBgrMat;
cvtColor(ResizedRgbMat,ResizedBgrMat, COLOR_RGB2BGR);
imwrite("test_resized_rgb.png", ResizedBgrMat);
}

The bpu_resize.hpp header file that implements bpu-resize and scales the cliped area is conducive to predicting the subsequent model. Of course, this step can be performed using the official roiInfer interface, but roiInfer There are some restrictions, such as the height and width of the clipping area must be less than 256.

class BpuResize{
public:
BpuResize(const BpuResize& other) = delete;
BpuResize& operator = (const BpuResize& other) = delete;
explicit BpuResize(const int input_w, const int input_h, const int output_w, const int output_h,  const imageType imgType);
~BpuResize();
void copy_image_2_input_tensor(uint8_t *image_data,hbDNNTensor *tensor);
float *Resize(Mat ori_img, const std::initializer_list<int> crop);
int32_t prepare_input_tensor();
int32_t prepare_output_tensor();
/ /...
}

4, summary

This test simplifies the cpp traversal process, and some steps are omitted in the subsequent upper board test. Meanwhile, the interface of bpu resize is tested, and the function of clipping and scaling is tested. It is found that if the original image is 1920*1080 and the size of the resize image is 640*640, The resize of opencv needs 40+ms, while the time of bpu interface only needs 25+ms. But at present, the resize of opencv is faster when enlarting the small image. I don’t know whether this is a normal phenomenon.