When you deploy it on the board, you’ll notice that the input-output tensor has two properties: validShape and alignedShape. This is because BPU has stride requirements. alignedShape is the size of the stride aligned data, and validShape is the original size.
During the actual run, the input size of the model is alignedShape, so it is unavoidable to complete the padding action during data preprocessing (padding value can only be 0). All you need to do is press alignedByteSize to apply for the memory space and specify alignedShape = validShape. DNN completes the padding operation according to the ignedShape signal. However, if the model is featuremap input, you need to complete the padding operation by yourself during preprocessing. for the model output tensor, if it ends directly with a bpu node, you need to press alignedShape to set the step size of the for loop, skipping the padding. If there are other cpu nodes in the tail of the model, the predictive library DNN has completed the padding removal operation during data transfer between the BPU and the CPU, and users do not need to pay attention to it.
1 Alignment Rules
Depending on how the data is arranged, the alignment rules of the model input and output tensor will be different:
NHWC: When input C > 4 or output, C bytes is aligned with 256 * {0, 1,… } + {0, 16, 32, 64, 128}; When C ≤ 4, H is aligned to 2 and W is aligned to 32
NCHW: W bytes align 256 * {0, 1,… } + {0, 16, 32, 64, 128}
For HB_DNN_IMG_TYPE_NV12, the H&W input to the model must be even.
The alignment rules for HB_DNN_IMG_TYPE_NV12 and HB_DNN_IMG_TYPE_Y are those that only require W to be a multiple of 16. They do not need to be aligned according to alignedShape.
(For the mapping between the data type HB_DNN_xxx supported by the DNN and the input_type_rt of the conversion configuration, see Section 1.1 for on-board Verification Precautions in the PTQ&QAT Scheme.)
example 1
In this example, tensor_type is int32, which is four bytes. The tensor’s C dimension is aligned from 425 to 448. The calculation is as follows:
1. The number of bytes in the C dimension is remainder when divided by 256:
(425 * sizeof(tensor_type)) % 256 = (425 * 4) % 256 = 164
2. Since 164 > 128, then align up to 256:
(256 - 164)/sizeof(tensor_type) = 92/4 = 23
3. Finally:
aligned_shape.C = 425+23 = 448
Example 2
In this example, tensor_type is int8, which takes only one byte. The tensor’s C dimension is aligned from 425 to 512. The calculation is as follows:
1. The number of bytes in the C dimension is remainder when divided by 256:
(425 * sizeof(tensor_type)) % 256 = 425 % 256 = 169
2. Since 169 > 128, then align up to 256:
(256 - 169) / sizeof(tensor_type) = 87
3. Finally:
aligned_shape.C = 425 + 87 = 512