Cache Mechanism

D-Robotics · September 1, 2023, 3:26am

1. Introduction

When using the horizon development board for model deployment, you will encounter two functions hbSysAllocCachedMem() and hbSysAllocMem(), they are used to apply for memory, you can find that their difference is only a cache, at this time, I do not know if you have these questions:

What’s a cache?
Why the cache?
How do you use the cache?

With these questions in mind, let’s learn about the cache mechanism.

2. What is cache

cache, Chinese name cache, is a kind of memory that can carry out high-speed data exchange, often used between CPU and memory. From the perspective of data interaction content, cache is a replica of a small amount of data in memory, and the data interaction between cache and memory is called cache refresh. In terms of data interaction speed, the data interaction between the CPU and the cache is much faster than that between the CPU and the memory.

3. Why use a cache

When the program is running, the CPU reads and writes instructions and data from the memory, but reading and writing directly from the memory is very slow, so the cache cache that can exchange high-speed data with the CPU is introduced. The cache mechanism works as follows:

When the CPU needs to read a data, it first searches for it from the cache and immediately sends it to the CPU for processing. The data is already in the cache and is replaced by a technical term: cache hit, when the CPU can read the data directly from the cache, which greatly speeds up access.
If the data is not in the cache, the technical term is: cache miss. In this case, the CPU needs to load data from relatively slow memory into the cache and then send it to the CPU for processing. This process involves slower memory access, so it causes some latency.

cache hits and cache misses together form the concept of cache hit ratio. The hit ratio is important to the cache because the CPU reads data from the cache first before memory, and the cache flushes only the data that has changed.

When is it recommended to use memory with cache?

** Scenario where the CPU frequently reads data **.
** Scenario in which the model reads and writes input/output data multiple times during continuous inference **.

** In the above scenarios, the cache hits can be fully exploited to greatly improve performance. In addition, cache is recommended for uncertain scenarios. **

At this point, you may have such a question, since the cache is so good, is it better to use the cache in all scenarios?

The computing devices of the Horizon Computing Platform include cpus and Bpus, and they share memory. As shown in the figure above, if the input data is processed only by the BPU, the cache is not used, which reduces the cache data refresh time (about 60us for 1M data).

4. How do I use the cache

Let’s start with some interface functions. hbSysAllocCachedMem() Request ** memory with cache **.

int32_t hbSysAllocCachedMem(hbSysMem *mem, uint32_t size);

Parameter
[in] size Specifies the size of the applied memory.
[out] mem memory pointer.
Return value
If 0 is returned, the API is successfully executed. Otherwise, the API fails to be executed.

4.2 hbSysFlushMem()

Refresh cache and memory data.

int32_t hbSysFlushMem(hbSysMem *mem, int32_t flag);

Parameter
[in] mem Memory pointer.
[in] flag Flushflag, which contains 1 and 2 parameters to control the flushflag direction. For details, see hbSysMemFlushFlag.
Return value
If 0 is returned, the API is successfully executed. Otherwise, the API fails to be executed.

4.3 hbSysMemFlushFlag

Memory and cache synchronization parameters.

typedef enum {
  HB_SYS_MEM_CACHE_INVALIDATE = 1,
  HB_SYS_MEM_CACHE_CLEAN = 2
} hbSysMemFlushFlag;

HB_SYS_MEM_CACHE_INVALIDATE synchronizes data in memory to the cache. This parameter is used by the CPU before the data is read. Otherwise, the CPU will read old data in the cache.
HB_SYS_MEM_CACHE_CLEAN Synchronizes data from the cache to the memory. If the CPU writes data to the memory, the BPU will read the old data from the memory.

** Note ** : It is defined in hb_sys.h, remember to include it when using.

The cache exists between the CPU and the memory. If data is not refreshed correctly, the contents in the cache may be inconsistent with those in the memory. In order to get the latest data every time, we need to update the data before and after the CPU reads. The CPU updates the data in memory to the cache before reading the data. After the CPU writes, the data in the cache is updated to the memory.

4.4 Code example

The following code is excerpted from OE1.1.62

ddk/samples/ai_toolchain/horizon_runtime_sample/code/00_quick_start/src/run_mobileNetV1_224x224.cc。

// define variables
std::vector<hbDNNTensor> input_tensors;
std::vector<hbDNNTensor> output_tensors;
hbDNNTensor *input = input_tensor;
int input_memSize = input[i].properties.alignedByteSize;
hbDNNTensor *output = output_tensor;
int output_memSize = output[i].properties.alignedByteSize;

// prepare input and output tensor
hbSysAllocCachedMem(&input[i].sysMem[0], input_memSize);
hbSysAllocCachedMem(&output[i].sysMem[0], output_memSize);

// make sure memory data is flushed to DDR before inference
hbSysFlushMem(&input_tensors[i].sysMem[0], HB_SYS_MEM_CACHE_CLEAN);

// run inference
hbDNNInfer(&task_handle,
           &output,
           input_tensors.data(),
           dnn_handle,
           &infer_ctrl_param);

// make sure CPU read data from DDR before using output tensor data
hbSysFlushMem(&output_tensors[i].sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);

4.5 hbSysAllocMem()

In this case, you do not need to use hbSysAllocMem() and hbSysFlushMem() to apply for memory.

```c++ int32_t hbSysAllocMem(hbSysMem *mem, uint32_t size); `

Parameter
[in] size Specifies the size of the applied memory.
[out] mem memory pointer.
Return value
If 0 is returned, the API is successfully executed. Otherwise, the API fails to be executed.

5. Reference links

https://zhuanlan.zhihu.com/p/482651908