Skip to content

GPU accelerated deep learning inference applications for RaspberryPi / JetsonNano / Linux PC using TensorflowLite GPUDelegate / TensorRT

License

Notifications You must be signed in to change notification settings

terryky/tflite_gles_app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU accelerated TensorFlow Lite / TensorRT applications.

TFLite-2.7

This repository contains several applications which invoke DNN inference with TensorFlow Lite GPU Delegate or TensorRT.

Target platform: Linux PC / NVIDIA Jetson / RaspberryPi.

1. Applications

  • Lightweight Face Detection.
  • Higher accurate Face Detection.
  • TensorRT port is HERE
  • Detect faces and estimage their Age and Gender
  • TensorRT port is HERE
  • Image Classfication using Moilenet.
  • TensorRT port is HERE
  • Object Detection using MobileNet SSD.
  • TensorRT port is HERE
  • 3D Facial Surface Geometry estimation and face replacement.
  • Hair segmentation and recoloring.
  • 3D Handpose Estimation from single RGB images.
  • Eye position estimation by detecting the iris.
  • 3D Object Detection.
  • TensorRT port is HERE
  • Pose Estimation (upper body).
  • Pose Estimation.
  • TensorRT port is HERE
  • Single-Shot 3D Human Pose Estimation.
  • TensorRT port is HERE
  • Depth Estimation from single images.
  • TensorRT port is HERE
  • Assign semantic labels to every pixel in the input image.
  • Face parts segmentation based on BiSeNet V2.
  • Generate anime-style face image.
  • Transform photos into anime style images.
  • Human portrait drawing by U^2-Net.
  • Create new artworks in artistic style.
  • Enhance low-light images upto a great extent.
  • GAN-model for image extrapolation.
  • Text detection from natural scenes.

2. How to Build & Run

2.1.1. setup environment
$ sudo apt install libgles2-mesa-dev 
$ mkdir ~/work
$ mkdir ~/lib
$
$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
2.1.2. build TensorFlow Lite library.
$ cd ~/work 
$ git clone https://github.com/terryky/tflite_gles_app.git
$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

$
$ ln -s tensorflow_r2.4 ./tensorflow
$
$ cp ./tensorflow/bazel-bin/tensorflow/lite/libtensorflowlite.so ~/lib
$ cp ./tensorflow/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so ~/lib
2.1.3. build an application.
$ cd ~/work/tflite_gles_app/gl2handpose
$ make -j4
2.1.4. run an application.
$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
$ cd ~/work/tflite_gles_app/gl2handpose
$ ./gl2handpose
2.2.1. build TensorFlow Lite library on Host PC.
(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_aarch64.sh

# If you want to build XNNPACK-enabled TensorFlow Lite, use the following script.
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_with_xnnpack_aarch64.sh

(Tensorflow configure will start after a while. Please enter according to your environment)
2.2.2. copy Tensorflow Lite libraries to target Jetson / Raspi.
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/libtensorflowlite.so [email protected]:/home/jetson/lib
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so [email protected]:/home/jetson/lib
2.2.3. clone Tensorflow repository on target Jetson / Raspi.
(Jetson/Raspi)$ cd ~/work
(Jetson/Raspi)$ git clone -b r2.4 https://github.com/tensorflow/tensorflow.git
(Jetson/Raspi)$ cd tensorflow
(Jetson/Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh
2.2.4. build an application.
(Jetson/Raspi)$ sudo apt install libgles2-mesa-dev libdrm-dev
(Jetson/Raspi)$ cd ~/work 
(Jetson/Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose

# on Jetson
(Jetson)$ make -j4 TARGET_ENV=jetson_nano TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi without GPUDelegate (recommended)
(Raspi )$ make -j4 TARGET_ENV=raspi4

# on Raspberry pi with GPUDelegate (low performance)
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi with XNNPACK
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=XNNPACK
2.2.5. run an application.
(Jetson/Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Jetson/Raspi)$ ./gl2handpose
about VSYNC

On Jetson Nano, display sync to vblank (VSYNC) is enabled to avoid the tearing by default . To enable/disable VSYNC, run app with the following command.

# enable VSYNC (default).
(Jetson)$ export __GL_SYNC_TO_VBLANK=1; ./gl2handpose

# disable VSYNC. framerate improves, but tearing occurs.
(Jetson)$ export __GL_SYNC_TO_VBLANK=0; ./gl2handpose
2.3.1. build TensorFlow Lite library on Host PC.
(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.3/build_libtflite_r2.3_armv7l.sh

(Tensorflow configure will start after a while. Please enter according to your environment)
2.3.2. copy Tensorflow Lite libraries to target Raspberry Pi.
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/libtensorflowlite.so [email protected]:/home/pi/lib
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so [email protected]:/home/pi/lib
2.3.3. setup environment on Raspberry Pi.
(Raspi)$ sudo apt install libgles2-mesa-dev libegl1-mesa-dev xorg-dev
(Raspi)$ sudo apt update
(Raspi)$ sudo apt upgrade
2.3.4. clone Tensorflow repository on target Raspi.
(Raspi)$ cd ~/work
(Raspi)$ git clone -b r2.3 https://github.com/tensorflow/tensorflow.git
(Raspi)$ cd tensorflow
(Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh
2.3.5. build an application on target Raspi..
(Raspi)$ cd ~/work 
(Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ make -j4 TARGET_ENV=raspi4  #disable GPUDelegate. (recommended)

#enable GPUDelegate. but it cause low performance on Raspi4.
(Raspi)$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2
2.3.6. run an application on target Raspi..
(Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ ./gl2handpose

for more detail infomation, please refer this article.

3. About Input video stream

Both Live camera and video file are supported as input methods.

  • UVC(USB Video Class) camera capture is supported.

  • Use v4l2-ctl command to configure the capture resolution.

    • lower the resolution, higher the framerate.
(Target)$ sudo apt-get install v4l-utils

# confirm current resolution settings
(Target)$ v4l2-ctl --all

# query available resolutions
(Target)$ v4l2-ctl --list-formats-ext

# set capture resolution (160x120)
(Target)$ v4l2-ctl --set-fmt-video=width=160,height=120

# set capture resolution (640x480)
(Target)$ v4l2-ctl --set-fmt-video=width=640,height=480
  • currently, only YUYV pixelformat is supported.

    • If you have error messages like below:
-------------------------------
 capture_devie  : /dev/video0
 capture_devtype: V4L2_CAP_VIDEO_CAPTURE
 capture_buftype: V4L2_BUF_TYPE_VIDEO_CAPTURE
 capture_memtype: V4L2_MEMORY_MMAP
 WH(640, 480), 4CC(MJPG), bpl(0), size(341333)
-------------------------------
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
...

please try to change your camera settings to use YUYV pixelformat like following command :

$ sudo apt-get install v4l-utils
$ v4l2-ctl --set-fmt-video=width=640,height=480,pixelformat=YUYV --set-parm=30
  • to disable camera
    • If your camera doesn't support YUYV, please run the apps in camera_disabled_mode with argument -x
$ ./gl2handpose -x
  • FFmpeg (libav) video decode is supported.
  • If you want to use a recorded video file instead of a live camera, follow these steps:
# setup dependent libralies.
(Target)$ sudo apt install libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libavutil-dev

# build an app with ENABLE_VDEC options
(Target)$ cd ~/work/tflite_gles_app/gl2facemesh
(Target)$ make -j4 ENABLE_VDEC=true

# run an app with a video file name as an argument.
(Target)$ ./gl2facemesh -v assets/sample_video.mp4

4. Tested platforms

You can select the platform by editing Makefile.env.

  • Linux PC (X11)
  • NVIDIA Jetson Nano (X11)
  • NVIDIA Jetson TX2 (X11)
  • RaspberryPi4 (X11)
  • RaspberryPi3 (Dispmanx)
  • Coral EdgeTPU Devboard (Wayland)

5. Performance of inference [ms]

Blazeface

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 10 10
TensorFlow Lite CPU int8 7 7
TensorFlow Lite GPU Delegate GPU fp16 70 10
TensorRT GPU fp16 -- ?

Classification (mobilenet_v1_1.0_224)

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 69 50
TensorFlow Lite CPU int8 28 29
TensorFlow Lite GPU Delegate GPU fp16 360 37
TensorRT GPU fp16 -- 19

Object Detection (ssd_mobilenet_v1_coco)

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 150 113
TensorFlow Lite CPU int8 62 64
TensorFlow Lite GPU Delegate GPU fp16 980 90
TensorRT GPU fp16 -- 32

Facemesh

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 29 30
TensorFlow Lite CPU int8 24 27
TensorFlow Lite GPU Delegate GPU fp16 150 20
TensorRT GPU fp16 -- ?

Hair Segmentation

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 410 400
TensorFlow Lite CPU int8 ? ?
TensorFlow Lite GPU Delegate GPU fp16 270 30
TensorRT GPU fp16 -- ?

3D Handpose

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 116 85
TensorFlow Lite CPU int8 80 87
TensorFlow Lite GPU Delegate GPU fp16 880 90
TensorRT GPU fp16 -- ?

3D Object Detection

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 470 302
TensorFlow Lite CPU int8 248 249
TensorFlow Lite GPU Delegate GPU fp16 1990 235
TensorRT GPU fp16 -- 108

Posenet (posenet_mobilenet_v1_100_257x257)

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 92 70
TensorFlow Lite CPU int8 53 55
TensorFlow Lite GPU Delegate GPU fp16 803 80
TensorRT GPU fp16 -- 18

Semantic Segmentation (deeplabv3_257)

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 108 80
TensorFlow Lite CPU int8 ? ?
TensorFlow Lite GPU Delegate GPU fp16 790 90
TensorRT GPU fp16 -- ?

Selfie to Anime

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 ? 7700
TensorFlow Lite CPU int8 ? ?
TensorFlow Lite GPU Delegate GPU fp16 ? ?
TensorRT GPU fp16 -- ?

Artistic Style Transfer

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 1830 950
TensorFlow Lite CPU int8 ? ?
TensorFlow Lite GPU Delegate GPU fp16 2440 215
TensorRT GPU fp16 -- ?

Text Detection (east_text_detection_320x320)

Framework Precision Raspberry Pi 4
[ms]
Jetson nano
[ms]
TensorFlow Lite CPU fp32 1020 680
TensorFlow Lite CPU int8 378 368
TensorFlow Lite GPU Delegate GPU fp16 4665 388
TensorRT GPU fp16 -- ?

6. Related Articles

7. Acknowledgements

About

GPU accelerated deep learning inference applications for RaspberryPi / JetsonNano / Linux PC using TensorflowLite GPUDelegate / TensorRT

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published