Common Image Formats

D-Robotics · May 8, 2023, 10:54am

1 Introduction

With the development of artificial intelligence, deep neural network in the field of vision “blossom”, in order to meet the needs of different scenes, we will come into contact with a variety of image data formats, this article will give you a detailed introduction to the common image data formats in deep learning scenes: RGB, BGR, YUV(YUV444, NV12), Gray.

2 RGB

RGB is a common color image format, and each pixel of the image stores a brightness value (0 ~ 255, UINT8) in three color channels: Red, Green, and Blue. Based on this, if recorded as (R, G, B), then (255,0,0), (0,255,0), and (0,0,255) can represent the purest red, green, and blue, respectively, as shown in the figure below. In particular, if the values of the three RGB channels are 0, black is obtained; If the values of the three channels are the maximum 255, the synthesis results in white.

RGB can represent the number of colors up to 256x256x256≈1677 million, far beyond the human eye perception range (about 10 million million), therefore, RGB is widely used in a variety of display fields, closely related to everyone’s daily life. However, RGB has a feature when representing color, each pixel must simultaneously store R, G, B three channel values, that is, each pixel needs 3 bytes of storage space, for video scene storage and transmission is very unfriendly, will take up a lot of space and bandwidth.

3 BGR

The BGR image format is similar to RGB, except that the red, green, and blue channels are arranged in a different order. In BGR format, the channel order of pixels is blue, green, and red, while in RGB format, the channel order of pixels is red, green, and blue. The BGR format is commonly used in computer vision libraries such as OpenCV, and is the default image format for some software and hardware, with which the compatibility is better. BGR, like RGB, has a large amount of data and is not suitable for storage and transmission of video scenes. Therefore, we also need other image formats to replace RGB/BGR for video, and YUV comes into play.

4 YUV

YUV is a color image format where Y stands for Luminance, used to specify the brightness of a pixel (understood as black and white), and U and V stand for Chrominance (Chroma), used to specify the color of the pixel, each value is represented by UINT8, as shown below. YUV format uses luminance-chroma separation, that is, only U and V participate in the representation of color, which is different from RGB.

It is not difficult to find that even if there is no U, V component, we can “recognize” the basic content of an image with only the Y component, but the present is a black and white image. The U and V components give color to these basic contents, and the black and white image evolves into a color image. This means that we can reduce the sampling of U and V components as much as possible while retaining the information of Y component, in order to minimize the amount of data, which is of great benefit to the storage and transmission of video data. This is also why YUV is more suitable for video processing than RGB.

4.1 Common YUV formats

According to research, the human eye is more sensitive to brightness information than color information. YUV subsampling is based on the characteristics of human eyes, the relatively insensitive color information of human eyes is compressed and sampled, and relatively small files are obtained for playback and transmission. According to the proportion of Y and UV, the commonly used YUV formats are: YUV444, YUV422, YUV420. The proportion of Y and UV in different acquisition methods is visually represented by three graphs.

YUV444: Each Y component corresponds to a pair of UV components, occupying 3 bytes per pixel (Y + U + V = 8 + 8 + 8 = 24bits); YUV422: Every two Y components share a pair of UV components, occupying 2 bytes per pixel (Y + 0.5U + 0.5V = 8 + 4 + 4 = 16bits); YUV420: Every four Y components share a pair of UV components, occupying 1.5 bytes per pixel (Y + 0.25U + 0.25V = 8 + 2 + 2 = 12bits); Now understand that the 4 in YUV4xx, this 4, actually expresses the largest shared unit! So at most 4 Y’s share a pair of UVs.

4.2 Detailed description of YUV420

In YUV420, a pixel corresponds to a Y, a 4X4 square corresponds to a U and a V, and each pixel takes up 1.5 bytes. According to different UV component arrangement, YUV420 can also be divided into YUV420P and YUV420SP two formats. YUV420P is to store U first, and then store V, the arrangement is as follows:

YUV420SP is alternately stored in UV and UV, and the arrangement is as follows:

At this point, I believe you can understand why YUV420 data in memory length is width * height * 3/2!

4.3NV12

The NV12 image format belongs to the YUV420SP format in the YUV color space, and every four Y components share a group of U components and V components, Y is stored continuously, and U and V are cross-stored. NV12 maintains image brightness information at the same time, the amount of data is half of the RGB/BGR format, which can reduce the time for the model to load input data, therefore, the embedded side usually chooses NV12 as the image data input during deployment. In some scenarios of embedded end model inference, when computing hardware applies for DDR memory to store NV12 data, it can be subdivided into two cases, which are named HB_DNN_IMG_TYPE_NV12 data format and HB_DNN_IMG_TYPE_NV12_SEPARATE data format. For the HB_DNN_IMG_TYPE_NV12 data format, the Y and UV components are stored in a continuous memory space, and for the HB_DNN_IMG_TYPE_NV12_SEPARATE data format, the Y and UV components are stored in two separate memory Spaces.

5 Gray

Gray image format, also known as gray image format, is a single channel image format. In a Gray image, each pixel contains only one brightness value and each value is represented by UINT8, which is an integer between 0 and 255. This brightness value represents the brightness of each pixel in the image, with a larger value indicating a brighter pixel and a smaller value indicating a darker pixel. Gray image format is also a common format when other color image formats (such as RGB, YUV, etc.) are converted into single-channel images, which only contains the brightness information of the image, and the image data is relatively small, which still has important application value for some scenes that are not sensitive to the color information of the image.

6 Conversion between image formats

After understanding different image format application scenarios, I do not know if you have such doubts: since we mainly use RGB in image acquisition and display, but in image storage, processing, and transmission, we have to choose YUV, in a complete application scenario, may need to use different image formats, then what to do? Image format conversion gorgeous debut solves this problem, then how to achieve the conversion between image formats? It can be simply understood that there is a “standard” based on which the conversion between different image formats can be completed through certain mathematical operations. The following computer vision library opencv wrapped function as an example, see how to achieve the image format conversion:

import cv2

bgr_img = cv2.imread('example.jpg')
cv2.imwrite('bgr_image.jpg', bgr_img)

rgb_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2RGB)
cv2.imwrite('rgb_image.jpg', rgb_img)

yuv_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2YUV)
cv2.imwrite('yuv_image.jpg', yuv_img)

gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)
cv2.imwrite('gray_image.jpg', gray_img)