【ATU Book-i.MX8系列 - TFLite 進階】手骨識別應用

一. 概述

手骨識別應用(Hand Skeleton Detection) 或是 手部特徵偵測(Hand Landmarks Detection) 是深度學習熱門的研究項目之一。主要用途是讓機器定位出手部的關節位置，並將各個節點連接起來。能夠廣泛應用於手勢識別中，像是利用手勢來操作螢幕，或是操作儀器時判斷手部姿勢是否正確。最常見的莫過於 Hand Landmarks 模組架構，也是利用輕量化網路架構 MobileNet 作主幹，來達到模組輕量化的目的。

若新讀者欲理解更多人工智慧、機器學習以及深度學習的資訊，可點選查閱下方博文
大大通精彩博文 【ATU Book-i.MX8系列】博文索引

TensorFlow Lite 進階系列博文-文章架構示意圖

二. 算法介紹

MobileNet 神經網路架構 :

此架構仍是使用 MobileNet 作為骨幹核心，其概念是利用拆分的概念，將原本的卷積層拆成 深度卷積(Depthwise Convolution) 與 逐點卷積(Pointwise Convolution) 兩個部分，稱作深層可分離卷積(Depthwise Separable Convolution) 。以此方式進行運算，能夠大幅度減少參數量，以達到加快運算速度。(用途擷取特徵)

MobileNet 輕量化概念示意圖
圖文來源 - 參考 LaptrihnX 網站

Hand Skeleton Detection

此技術大致上可以分成 2D 與 3D 特徵點預測。簡單來說，前者指得就是考慮平面的關係，後者則必須考慮到空間的影響。也就是當擺出 C 形狀的手勢時，平面概念可能容易識別不到手勢，若以空間概念去訓練模組的話，識別手勢的機率也會相應增加。因此有些模組或是算法，會利用時間維度上來增加識別率，但也擴大輸入資料。因此，Google 團隊提出了一套 2.5D 預測方式，以手腕處作為中心點來計算各個關節點的相對深度，近而達到類似 3D 效果，且僅需輸入單一影像即可，其概念如下圖，

Hand Skeleton Detection 概念示意圖
圖文來源 - Paper

同時，該團隊透過 3 萬張真實世界的圖片訓練，來預測 21 個關節點的相對深度，如下圖。

Hand Skeleton Detection各節點示意圖
圖文來源 - 參考MediaPipe 網站

二. 算法介紹

Google 官方提供效果極佳的 Hand landmarks detection guide for Pytho 範例與 Hand Landmark 模組實現，讀者可直接依 MediaPiepe 的作法實現。而此範例將延用該模組，並量化為整數運算呈現。

實現步驟如下:

第一步 : 開啟 Colab 設定環境

%tensorflow_version 2.x

第二步 : 下載轉換套件

!pip install tf2onnx
!pip install onnx-tf==1.9.0
!pip install onnx==1.9.0

第三步 : TensorFlow Lite 轉換 ONNX

! python -m tf2onnx.convert --opset 11 --tflite /root/hand_landmark.tflite --output /root/hand_landmark.onnx

第四步 : ONNX 轉換 SavedModel

! onnx-tf convert -i /root/hand_landmark.onnx -o /root/hand_landmark

第五步 : TensorFlow Lite 轉換

import tensorflow as tf
import numpy as np 
def representative_dataset_gen(): 
    for _ in range(250):
        yield [np.random.uniform(0.0, 1.0, size=(1, 256, 256, 3)).astype(np.float32)] 

model = tf.saved_model.load("/root/hand_landmark")
concrete_func = model.signatures["serving_default"]
concrete_func.inputs[0].set_shape([1, 256, 256, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type  = tf.float32
converter.inference_output_type = tf.float32
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
with open("/root/handskeleton_qunat_new.tflite",'wb') as f:
    f.write(tflite_model)

第六步 : Hand Skeleton Detection 範例實現 (於 i.MX8M Plus 撰寫運行)

import sys

import cv2
import time
import argparse
import numpy as np
from tflite_runtime.interpreter import Interpreter

interpreter = Interpreter(model_path='/root/handskeleton_qunat_new.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
width = input_details[0]['shape'][2]
height = input_details[0]['shape'][1]
nChannel = input_details[0]['shape'][3]

frame = cv2.imread("/root/hand.jpg")
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)
input_data = input_data.astype('float32')
input_data = input_data /255
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter_time_start = time.time()
interpreter.invoke()
interpreter_time_end = time.time()
print("Inference Time = ", (interpreter_time_end - interpreter_time_start)*1000 , " ms" )

feature = interpreter.get_tensor(output_details[0]['index'])[0].reshape(21, 3)
hand_detected = interpreter.get_tensor(output_details[1]['index'])[0]

# 建立輸出結果 - 特徵位置
Px = []
Py = []
size_rate = [frame.shape[1]/width, frame.shape[0]/height]
for pt in feature:
x = int(pt[0]*size_rate[0])
y = int(pt[1]*size_rate[1])
Px.append(x)
Py.append(y)

# 建立輸出結果
if (hand_detected) :
# 拇指
cv2.line(frame, (Px[0], Py[0]) , (Px[1], Py[1]) , (0, 255, 0), 3)
cv2.line(frame, (Px[1], Py[1]) , (Px[2], Py[2]) , (0, 255, 0), 3)
cv2.line(frame, (Px[2], Py[2]) , (Px[3], Py[3]) , (0, 255, 0), 3)
cv2.line(frame, (Px[3], Py[3]) , (Px[4], Py[4]) , (0, 255, 0), 3)

# 食指
cv2.line(frame, (Px[0], Py[0]) , (Px[5], Py[5]) , (0, 255, 0), 3)
cv2.line(frame, (Px[5], Py[5]) , (Px[6], Py[6]) , (0, 255, 0), 3)
cv2.line(frame, (Px[6], Py[6]) , (Px[7], Py[7]) , (0, 255, 0), 3)
cv2.line(frame, (Px[7], Py[7]) , (Px[8], Py[8]) , (0, 255, 0), 3)
# 中指
cv2.line(frame, (Px[5], Py[5]) , (Px[9], Py[9]) , (0, 255, 0), 3)
cv2.line(frame, (Px[9], Py[9]) , (Px[10], Py[10]) , (0, 255, 0), 3)
cv2.line(frame, (Px[10], Py[10]) , (Px[11], Py[11]) , (0, 255, 0), 3)
cv2.line(frame, (Px[11], Py[11]) , (Px[12], Py[12]) , (0, 255, 0), 3)

# 無名指
cv2.line(frame, (Px[9], Py[9]) , (Px[13], Py[13]) , (0, 255, 0), 3)
cv2.line(frame, (Px[13], Py[13]) , (Px[14], Py[14]) , (0, 255, 0), 3)
cv2.line(frame, (Px[14], Py[14]) , (Px[15], Py[15]) , (0, 255, 0), 3)
cv2.line(frame, (Px[15], Py[15]) , (Px[16], Py[16]) , (0, 255, 0), 3)

# 小指
cv2.line(frame, (Px[13], Py[13]) , (Px[17], Py[17]) , (0, 255, 0), 3)
cv2.line(frame, (Px[17], Py[17]) , (Px[18], Py[18]) , (0, 255, 0), 3)
cv2.line(frame, (Px[18], Py[18]) , (Px[19], Py[19]) , (0, 255, 0), 3)
cv2.line(frame, (Px[19], Py[19]) , (Px[20], Py[20]) , (0, 255, 0), 3)
cv2.line(frame, (Px[17], Py[17]) , (Px[0], Py[0]) , (0, 255, 0), 3)

#指節
for i in range(len(Px)):
cv2.circle(frame, ( Px[i] , Py[i] ), 1, (0, 0, 255), 4)

import matplotlib.pyplot as plt
plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

Hand Skeleton Detection 實現結果呈現

如下圖所示，成功將圖片轉換偵測到各個手部關節的位置。
在 i.MX8M Plus 的 NPU 處理器，推理時間(Inference Time) 約 12.69 ms。

四. 結語

手骨識別應用 (Hand Skeleton Detection) 通常需要搭配 手部偵測(Hand Detection) 來作應用。也就是偵測到手部的位置後，將局部會特徵交付給手骨識別模組進行特徵提取，才能將準確度應用最大化。最後利用所檢測到的 21 個手骨關節位置來作後續的判斷機制，即可以實現手勢操作等等應用。目前運行在 i.MX8MP 的 Vivante VIP8000 NPU，其推理時間可達每秒 12.69 ms 的處理速度，約 78 張 FPS ，以及在適當的距離下，有不錯的檢測率。由於此範例屬於複合式的應用，故實際花費時間應該為手部與手骨偵測的花費時間，粗估計算為 10ms + N * ( 12 ms ) ，其中 N 為偵測到的手部數量。下一章節將會介紹機器學習 GAN 架構應用之一的 “風格轉換應用( Style Transform)” ，敬請期待 !!。

五. 參考文件

[1] SSD: Single Shot MultiBox Detector
[2] SSD-Tensorflow
[3] Single Shot MultiBox Detector (SSD) 論文閱讀
[4] ssd-mobilenet v1 演算法結構及程式碼介紹
[5] Nonparametric Structure Regularization Machine for 2D Hand Pose Estimationr
[6]Mediapipe - Hand landmarks detection guide for Python

如有任何相關 TensorFlow Lite 進階技術問題，歡迎至博文底下留言提問 !!
接下來還會分享更多 TensorFlow Lite 進階的技術文章 !!敬請期待 【ATU Book-i.MX8系列 – TFLite 進階】 !!

★博文內容均由個人提供，與平台無關，如有違法或侵權，請與網站管理員聯繫。

★文明上網，請理性發言。內容一周內被舉報5次，發文人進小黑屋喔~

【ATU Book-i.MX8系列 - TFLite 進階】手骨識別應用

參考來源

評論