1. Preface
Text detection and recognition is one of the classic tasks in CV direction, which mainly includes two steps, text box detection and character recognition. This paper will implement the following three parts respectively on x3, text detection, character recognition, text detection + character recognition for everyone to choose, and provide onnxruntime and pytorch code deployment code. At the same time, it is the first time to use the grayscale map as the input of the model on x3, and the configuration file and data set preparation will also be given in the quantization, hoping that x3 can play its value in more areas.
Text detection model USES is dbnet:https://github.com/SURFZJY/Real-time-Text-Detection-DBNet
Character recognition model USES is crnn:https://github.com/meijieru/crnn.pytorch
The test code for this article is at https://github.com/Rex-LK/ai_arm_learning
2 dbnet text detection
2.1 Introduction to the model
dbnet is a widely used model for text detection. Its essence is a segmentation model, which will eventually segment the text area on the image. Its model structure is as follows:
dbnet.png
2.2. Model quantification
- Export onnx
The code to export the onnx model is provided in x3/ocr/dbnet/predict.py.
- Export bin
dbnet’s quantization process is consistent with the common quantization process, and the configuration file is as follows:
model_parameters:
onnx_model: 'dbnet_simp.onnx'
output_model_file_prefix: 'dbnet_simp'
march: 'bernoulli2'
input_parameters:
input_type_train: 'rgb'
input_layout_train: 'NCHW'
# 'rgb' / 'nv12' / yuv444 / 'bgr'
input_type_rt: 'nv12'
norm_type: 'data_scale'
scale_value: 0.003921568627451
input_layout_rt: 'NHWC'
calibration_parameters:
cal_data_dir: './calibration_data_yuv_f32'
calibration_type: 'max'
max_percentile: 0.9999
compiler_parameters:
compile_mode: 'latency'
optimize_level: 'O3'
debug: False
core_num: 2
2.3. dbnet test code
onnxruntime test code is x3/ocr/dbnet/infer_onnxruntime.py, the following is the inference code,
def predict(self, img, d_size = (640,640), min_area: int = 100):
img0_h,img0_w = img.shape[:2]
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# 图片预处理
img = cv2.resize(img,d_size)
img = img / 255
img = img.astype(np.float32)
img = img.transpose(2,0,1)
img = np.ascontiguousarray(img)
img = img[None, ...]
preds = self.model.run(["output"], {"image": img})[0]
preds = torch.from_numpy(preds[0])
scale = (preds.shape[2] / img0_w, preds.shape[1] / img0_h)
start = time.time()
prob_map, thres_map = preds[0], preds[1]
out = (prob_map > self.thr).float() * 255
out = out.data.cpu().numpy().astype(np.uint8)
contours, hierarchy = cv2.findContours(out, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = [(i / scale).astype(np.int) for i in contours if len(i)>=4]
dilated_polys = []
for poly in contours:
poly = poly[:,0,:]
D_prime = cv2.contourArea(poly) * self.ratio_prime / cv2.arcLength(poly, True) # formula(10) in the thesis
pco = pyclipper.PyclipperOffset()
pco.AddPath(poly, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
dilated_poly = np.array(pco.Execute(D_prime))
if dilated_poly.size == 0 or dilated_poly.dtype != np.int or len(dilated_poly) != 1:
continue
dilated_polys.append(dilated_poly)
boxes_list = []
for cnt in dilated_polys:
if cv2.contourArea(cnt) < min_area:
continue
rect = cv2.minAreaRect(cnt)
box = (cv2.boxPoints(rect)).astype(np.int)
boxes_list.append(box)
t = time.time() - start
boxes_list = np.array(boxes_list)
return dilated_polys, boxes_list, t
2.4、 results contour.png
The closed curve in green is the text box area recognized by dbnet.
3 crnn character recognition
3.1 Introduction to the model
crnn is a convolutional recurrent neural network, which realizes the combination of CNN and LSTM to extract image features, and finally uses ctc decoding to realize text recognition. The model structure is as follows:
crnn.jpeg
3.2. Model quantification
- Export onnx
x3/crnn/demo.py provides the code to export the onnx model. One thing is different from other models :crnn accepts grayscale image as the input of the model with dimension [1,1,32,100].
-
Prepare grayscale calibration data
import os import cv2 import numpy as np src_root = '../../../01_common/calibration_data/coco/' cal_img_num = 100 dst_root = 'calibration_data' num_count = 0 img_names = [] for src_name in sorted(os.listdir(src_root)): if num_count > cal_img_num: break img_names.append(src_name) num_count += 1 if not os.path.exists(dst_root): os.system('mkdir {0}'.format(dst_root)) def imequalresize(img, target_size, pad_value=127.): target_w, target_h = target_size image_h, image_w = img.shape[:2] img_channel = 3 if len(img.shape) > 2 else 1 scale = min(target_w * 1.0 / image_w, target_h * 1.0 / image_h) new_h, new_w = int(scale * image_h), int(scale * image_w) resize_image = cv2.resize(img, (new_w, new_h)) pad_image = np.full(shape=[target_h, target_w, img_channel], fill_value=pad_value) dw, dh = (target_w - new_w) // 2, (target_h - new_h) // 2 pad_image[dh:new_h + dh, dw:new_w + dw, :] = resize_image return pad_image for each_imgname in img_names: img_path = os.path.join(src_root, each_imgname) img = cv2.imread(img_path) # BRG, HWC img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # RGB, HWC img = imequalresize(img, (100, 32)) img = img.astype(np.uint8) img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY) dst_path = os.path.join(dst_root, each_imgname + '.rgbchw') print("write:%s" % dst_path) img.astype(np.uint8).tofile(dst_path) print('finish') -
config.yaml
model_parameters: onnx_model: 'crnn_simp.onnx' output_model_file_prefix: 'crnn_simp' march: 'bernoulli2' input_parameters: input_type_train: 'gray' input_layout_train: 'NCHW' # 'rgb' / 'nv12' / yuv444 / 'bgr' input_type_rt: 'gray' norm_type: 'data_scale' scale_value: 0.0078125 input_layout_rt: 'NHWC' calibration_parameters: cal_data_dir: './calibration_data' calibration_type: 'max' max_percentile: 0.9999 compiler_parameters: compile_mode: 'latency' optimize_level: 'O3' debug: False core_num: 2
3.3 crnn test code
The crnn test file is x3/ocr/crnn/infer_onnxruntime.py
def predict(self,img):
# 灰度
image_input = self.preprocess_gray(img,(100,32))
print(image_input.shape)
preds = self.model.run(["output"], {"image": image_input})[0]
preds = torch.from_numpy(preds)
_, preds = preds.max(2)
preds = preds.transpose(1, 0).contiguous().view(-1)
preds_size = Variable(torch.IntTensor([preds.size(0)]))
raw_pred = self.converter.decode(preds.data, preds_size.data, raw=True)
sim_pred = self.converter.decode(preds.data, preds_size.data, raw=False)
return raw_pred,sim_pred
4 Text detection + character recognition
4.1 Identification Process
In the actual project process, text detection and character recognition two processes are often bound together, first through the text detection model to find the text area, and then use crnn to identify the text area inside the content, the following will be through a simple example to achieve this process.
4.2 Test Code
Take x3 test code as an example :x3/ocr/demo_x3.py
dbnet = dbnet_model(args.dbnet_path)
# 该repo的crnn可以识别 数字以及字母
alphabet = '0123456789abcdefghijklmnopqrstuvwxyz'
converter = strLabelConverter(alphabet)
crnn = crnn_model(args.crnn_path,converter)
img0 = cv2.imread(args.image_path)
img_rec = img0.copy()
img0_h,img0_w = img0.shape[:2]
# boxes_list 为所有文本框区域
contours, boxes_list, t = dbnet.predict(img0)
for i,box in enumerate(boxes_list):
mask_t = np.zeros((img0_h, img0_w), dtype=np.uint8)
# 将某一个文本区域单独提取出来
cv2.fillPoly(mask_t, [box], (255), 8, 0)
pick_img = cv2.bitwise_and(img0, img0, mask=mask_t)
x, y, w, h = cv2.boundingRect(box)
crnn_infer_img = pick_img[y:y+h,x:x+w,:]
crnn_infer_img = cv2.cvtColor(crnn_infer_img,cv2.COLOR_BGR2GRAY)
# crnn 识别
raw_pred,sim_pred = crnn.predict(crnn_infer_img)
print('%-20s => %-20s' % (raw_pred, sim_pred))
if args.output_folder:
cv2.putText(img_rec, sim_pred, (x,y+20), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255), 1)
if args.output_folder:
img_det = img0[:, :, ::-1]
imgc = img_det.copy()
cv2.drawContours(imgc, contours, -1, (22,222,22), 1, cv2.LINE_AA)
cv2.imwrite(args.output_folder + '/contour.png', imgc)
img_draw = draw_bbox(img_rec, boxes_list)
cv2.imwrite(args.output_folder + '/predict.jpg', img_draw)
4.3 results
predict.png From the result, dbnet can accurately predict the text area, of course, it can also adjust the size of the text area by adjusting the threshold of dbnet, and crnn can well identify the English letters in the text area.
5, summary
In this paper, a model of text detection and character recognition in the OCR algorithm is implemented on x3, while reviewing some algorithms studied before, and finally hope that x3 extends to more fields and creates greater value.