Machine Learning Test on DEBIX Model A

2024.1.10by debix.io

Machine learning algorithms require high computing power, usually using a GPU or NPU for acceleration. DEBIX Model A uses NXP i.MX 8M Plus processor, supporting a variety of algorithms accelerated by its CPU, GPU and NPU. As the first processor of i.MX Applications series with an integrated machine learning accelerator, i.MX 8M Plus brings powerful performance for ML application at the edge.


Key parameters of i.MX 8M Plus (for industrial products):

-CPU: 4 x Arm® Cortex®-A53, 1.6GHz

-NPU: 2.3 TOP/s


The following tests are performed under Ubuntu22.04 system, the first part of the essay is about running the label_image script under the TensorFlow Lite framework to test the recognition rate and running speed of DEBIX CPU, XNNPACK delegate and NPU; the second part is about utilizing a self-made fruit classification model to test their recognition accuracy and running speed.


1. NXP TensorFlow Lite Test on DEBIX

1.1 TensorFlow Lite Test on DEBIX CPU

Test result:

Operation log:

debix@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples$ ./label_image

INFO: Loaded model ./mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: invoked

INFO: average time: 48.08 ms

INFO: 0.764706: 653 megalith

INFO: 0.121569: 907 wig

INFO: 0.0156863: 458 bookshop

INFO: 0.0117647: 466 broom

INFO: 0.00784314: 835 studio couch


1.2 TensorFlow Lite Test on DEBIX CPU with XNNPACK Delegate Acceleration

Note: Delegate is a mechanism in TensorFlow Lite that allows specific tasks to be delegated to optimized hardware libraries or backends. XNNPACK Delegate is one of these delegates, a library for accelerating convolution, matrix and deep learning computations on ARM CPUs, so as to improve the inference performance of TensorFlow Lite models.

Test result:

Operation log:

debix@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples$ ./label_image --use_xnnpack=true

INFO: Loaded model ./mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

XNNPACK delegate created.

INFO: Applied XNNPACK delegate.

INFO: invoked

INFO: average time: 45.236 ms

INFO: 0.764706: 653 megalith

INFO: 0.121569: 907 wig

INFO: 0.0156863: 458 bookshop

INFO: 0.0117647: 466 broom

INFO: 0.00784314: 835 studio couch


1.3 TensorFlow Lite Test on DEBIX NPU

Test result:

Operation log:

debix@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples$ ./label_image --external_delegate_path=/usr/lib/libvx_delegate.so

INFO: Loaded model ./mobilenet_v1_1.0_224_quant.tflite

INFO: resolved reporter

Vx delegate: allowed_cache_mode set to 0.

Vx delegate: device num set to 0.

Vx delegate: allowed_builtin_code set to 0.

Vx delegate: error_during_init set to 0.

Vx delegate: error_during_prepare set to 0.

Vx delegate: error_during_invoke set to 0.

EXTERNAL delegate created.

INFO: Applied EXTERNAL delegate.

W [HandleLayoutInfer:274]Op 162: default layout inference pass.

INFO: invoked

INFO: average time: 2.581 ms

INFO: 0.768627: 653 megalith

INFO: 0.105882: 907 wig

INFO: 0.0196078: 458 bookshop

INFO: 0.0117647: 466 broom

INFO: 0.00784314: 835 studio couch


1.4 Conclusion


As what we can see from the table above, when we run the label_image script under the TensorFlow Lite, the XNNPACK Delegate slightly increases the running speed compared to only using CPU. And when we use NPU for acceleration, the running speed appears to increase substantially since the runtime decreases from 48.08ms to 2.581ms. And another thing worth noting is that there is no difference in the probability of recognizing megalith, wig, bookshop, broom and studio couch in this process.


2. Self-made Fruit Classification Model Test on DEBIX

2.1 Fruit Classification Model Preparation

There are three labels in this dataset, apple, banana, pitaya, and a total of 1198 images. We use the eIQ tool, select Classification Model’s Performance, and then select NPU. Here we modify the configuration to Input Size:224,224,3, Batch Size:100, Epochs To Train:Infinity, Model Enhancement: Default Enhancement, and all other settings are remained default in eIQ tool. After this process, we start to train the model, and after the training is completed, we need to select validation, in which we can check the recognition rate of each label, so that we can know where the model needs to be optimized. Through adjusting the training parameters of the eIQ tool, correcting the dataset and retraining, we improve the recognition rate of the dataset.

Finally, the dataset meets our requirements and is exported from eIQ to DEBIX. Some information of the exported dataset is as follows:



It can be seen that the training accuracy of the model reaches 97.6% and the validation accuracy reaches 93.24%.

Next, we can use some photos to test its recognition accuracy and running speed.


2.2 Apple Recognition

Photo for test:


2.2.1 Using DEBIX CPU for Apple Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i apple.bmp

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

INFO: invoked

INFO: average time: 136.018 ms

INFO: 0.992188: 0 apple

INFO: 0.0078125: 1 banana

Running the classification model on DEBIX CPU to detect an apple, it is found that the recognition rate reaches 0.992188, and the runtime is 136.018ms.


2.2.2 Using XNNPACK Delegate for Apple Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i apple.bmp --use_xnnpack=true

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

XNNPACK delegate created.

INFO: Applied XNNPACK delegate.

INFO: invoked

INFO: average time: 67.031 ms

INFO: 0.992188: 0 apple

INFO: 0.0078125: 1 banana

When we use XNNPACK Delegate, the recognition rate remains 0.992188, and the runtime is shortened nearly by half, 67.031ms.


2.2.3 Using DEBIX NPU for Apple Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i apple.bmp --external_delegate_path=/usr/lib/libvx_delegate.so

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

Vx delegate: allowed_cache_mode set to 0.

Vx delegate: device num set to 0.

Vx delegate: allowed_builtin_code set to 0.

Vx delegate: error_during_init set to 0.

Vx delegate: error_during_prepare set to 0.

Vx delegate: error_during_invoke set to 0.

EXTERNAL delegate created.

INFO: Applied EXTERNAL delegate.

INFO: invoked

INFO: average time: 3.682 ms

INFO: 0.988281: 0 apple

INFO: 0.0078125: 1 banana

INFO: 0.00390625: 2 pitaya

When we use NPU acceleration, the recognition rate remains the same, and it only takes 3.682ms!


2.3 Banana Recognition

Photo for test:

2.3.1 Using DEBIX CPU for Banana Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i banana.bmp

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

INFO: invoked

INFO: average time: 136.19 ms

INFO: 0.996094: 1 banana


2.3.2 Using XNNPACK Delegate for Banana Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i banana.bmp --use_xnnpack=true

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

XNNPACK delegate created.

INFO: Applied XNNPACK delegate.

INFO: invoked

INFO: average time: 69.795 ms

INFO: 0.996094: 1 banana


2.3.3 Using DEBIX NPU for Banana Recognition

Test result:




Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i banana.bmp --external_delegate_path=/usr/lib/libvx_delegate.so

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

Vx delegate: allowed_cache_mode set to 0.

Vx delegate: device num set to 0.

Vx delegate: allowed_builtin_code set to 0.

Vx delegate: error_during_init set to 0.

Vx delegate: error_during_prepare set to 0.

Vx delegate: error_during_invoke set to 0.

EXTERNAL delegate created.

INFO: Applied EXTERNAL delegate.

INFO: invoked

INFO: average time: 3.805 ms

INFO: 0.996094: 1 banana



2.4 Pitaya Recognition

Photo for test:


2.4.1 Using DEBIX CPU for Pitaya Recognition

Test result:

Operation log:

rroot@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i pitaya.bmp

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

INFO: invoked

INFO: average time: 136.404 ms

INFO: 0.996094: 2 pitaya


2.4.2 Using XNNPACK Delegate for Pitaya Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i pitaya.bmp --use_xnnpack=true

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

XNNPACK delegate created.

INFO: Applied XNNPACK delegate.

INFO: invoked

INFO: average time: 60.709 ms

INFO: 0.996094: 2 pitaya


2.4.3 Using DEBIX NPU for Pitaya Recognition

Test result:

Operation log:

root@imx8mpevk:/usr/bin/tensorflow-lite-2.9.1/examples# ./label_image -m mobilenet_v1_1.2_npu_224_fruit.tflite -i pitaya.bmp --external_delegate_path=/usr/lib/libvx_delegate.so

INFO: Loaded model mobilenet_v1_1.2_npu_224_fruit.tflite

INFO: resolved reporter

Vx delegate: allowed_cache_mode set to 0.

Vx delegate: device num set to 0.

Vx delegate: allowed_builtin_code set to 0.

Vx delegate: error_during_init set to 0.

Vx delegate: error_during_prepare set to 0.

Vx delegate: error_during_invoke set to 0.

EXTERNAL delegate created.

INFO: Applied EXTERNAL delegate.

INFO: invoked

INFO: average time: 3.764 ms

INFO: 0.996094: 2 pitaya


2.5 Conclusion

After we use DEBIX CPU, XNNPACK Delegate and NPU for apple, banana and pitaya recognition test respectively, it turns out that the recognition rate of DEBIX CPU, XNNPACK Delegate and NPU is very stable or even the same, and all the recognition rate reaches 0.98 or above.

XNNPACK Delegate can cut the runtime almost by half, while NPU has the most significant acceleration effect, taking only a few milliseconds to complete the recognition test.