MediaPipe基本介紹

► 前言

前陣子因為專案研究MediaPipe face_mesh功能，使用後覺得很厲害，因此寫一篇博文介紹一下。MediaPipe是一個開源的跨平台框架，實現一些常見的機器學習任務，如人臉偵測、手部追蹤等。

► 介紹

MediaPipe 是由 Google Research 開發的開源框架，於2019年6月提出的開源框架，用於構建多媒體機器學習應用。支持多種平台和程式語言，包括 JavaScript、Python、C++、Java等，可以在Web、Android、iOS、Linux、Windows、MacOS 和邊緣設備上運行。

優點：

提供低代碼或無代碼的方式來定製和部署多媒體機器學習功能，使用 MediaPipe Studio 可視化地創建和調試多媒體處理流程，或使用 MediaPipe API實現所需的功能。
Google ML 專家打造的先進的機器學習解決方案，例如：

多媒體處理流程中進行了端到端的優化，包括硬體加速、模型壓縮、模型量化等，使得 MediaPipe 能夠在不同的平台和設備上達到高效且低延遲的性能。

缺點：

文檔相對於其他框架來說比較缺乏，諸多細節未清楚地說明，需要花更多的時間來研究。
社區相對於其他框架來說比較不活躍，較難找資源解決遇到的問題。
不同的平台和設備上可能會有自身限制和問題，開發者可能需要根據不同的情況來調整流程。

►Demo網站

進入MediaPipe官方網站，點選See demos按鈕，如下圖：

進入後如下圖，再次點選See demo按鈕：

進入後如下圖，網頁就會詢問是否開啟攝像頭權限，同意開啟後網頁就會展示該專案模型及應用畫面，可以根據需求調整參數：

範例連結：

Hand Landmark Detection

Face Detection

►Python Demo

透過Colab平台以Python的方式顯示，範例使用：Gesture recognition

!pip install -q mediapipe==0.10.0
!wget -q https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/1/gesture_recognizer.task

連接雲端硬碟

from google.colab import drive
drive.mount('/content/drive')

可視化實用程序

from matplotlib import pyplot as plt
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2

plt.rcParams.update({
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.spines.left': False,
    'axes.spines.bottom': False,
    'xtick.labelbottom': False,
    'xtick.bottom': False,
    'ytick.labelleft': False,
    'ytick.left': False,
    'xtick.labeltop': False,
    'xtick.top': False,
    'ytick.labelright': False,
    'ytick.right': False
})

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles


def display_one_image(image, title, subplot, titlesize=16):
    """Displays one image along with the predicted category name and score."""
    plt.subplot(*subplot)
    plt.imshow(image)
    if len(title) > 0:
        plt.title(title, fontsize=int(titlesize), color='black', fontdict={'verticalalignment':'center'}, pad=int(titlesize/1.5))

return (subplot[0], subplot[1], subplot[2]+1)

def display_batch_of_images_with_gestures_and_hand_landmarks(images, results):
"""Displays a batch of images with the gesture category and its score along with the hand landmarks."""
# Images and labels.
images = [image.numpy_view() for image in images]
gestures = [top_gesture for (top_gesture, _) in results]
multi_hand_landmarks_list = [multi_hand_landmarks for (_, multi_hand_landmarks) in results]

# Auto-squaring: this will drop data that does not fit into square or square-ish rectangle.
rows = int(math.sqrt(len(images)))
cols = len(images) // rows

# Size and spacing.
FIGSIZE = 13.0
SPACING = 0.1
subplot=(rows,cols, 1)
if rows < cols:
plt.figure(figsize=(FIGSIZE,FIGSIZE/cols*rows))
else:
plt.figure(figsize=(FIGSIZE/rows*cols,FIGSIZE))

# Display gestures and hand landmarks.
for i, (image, gestures) in enumerate(zip(images[:rows*cols], gestures[:rows*cols])):
title = f"{gestures.category_name} ({gestures.score:.2f})"
dynamic_titlesize = FIGSIZE*SPACING/max(rows,cols) * 40 + 3
annotated_image = image.copy()

for hand_landmarks in multi_hand_landmarks_list[i]:
hand_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
hand_landmarks_proto.landmark.extend([
landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in hand_landmarks
])

mp_drawing.draw_landmarks(
annotated_image,
hand_landmarks_proto,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())

subplot = display_one_image(annotated_image, title, subplot, titlesize=dynamic_titlesize)

# Layout.
plt.tight_layout()
plt.subplots_adjust(wspace=SPACING, hspace=SPACING)
plt.show()

根據寬度及高度縮放圖像用於顯示畫面

import cv2

from google.colab.patches import cv2_imshow
import math

DESIRED_HEIGHT = 480
DESIRED_WIDTH = 480

def resize_and_show(image):
  h, w = image.shape[:2]
  if h < w:
    img = cv2.resize(image, (DESIRED_WIDTH, math.floor(h/(w/DESIRED_WIDTH))))
  else:
    img = cv2.resize(image, (math.floor(w/(h/DESIRED_HEIGHT)), DESIRED_HEIGHT))
  cv2_imshow(img)


IMAGE_FILENAMES = ['/content/drive/MyDrive/AI/3.jpg']

# Preview the images.
images = {name: cv2.imread(name) for name in IMAGE_FILENAMES}
for name, image in images.items():
  print(name)
  resize_and_show(image)

執行推論並顯示圖像

# STEP 1: Import the necessary modules.
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# STEP 2: Create an GestureRecognizer object.
base_options = python.BaseOptions(model_asset_path='gesture_recognizer.task')
options = vision.GestureRecognizerOptions(base_options=base_options)
recognizer = vision.GestureRecognizer.create_from_options(options)

images = []
results = []
for image_file_name in IMAGE_FILENAMES:
  # STEP 3: Load the input image.
  image = mp.Image.create_from_file(image_file_name)

  # STEP 4: Recognize gestures in the input image.
  recognition_result = recognizer.recognize(image)

  # STEP 5: Process the result. In this case, visualize it.
  images.append(image)
  top_gesture = recognition_result.gestures[0][0]
  hand_landmarks = recognition_result.hand_landmarks
  results.append((top_gesture, hand_landmarks))

display_batch_of_images_with_gestures_and_hand_landmarks(images, results)

► 小結

MediaPipe是一個強大而方便的機器學習框架，可以幫助開發者快速地開發和部署多模態的人工智能應用，需要根據自己的需求和情況來選擇和使用，期待下一篇博文吧！。

► 參考資料

MediaPipe

MediaPipe GitHub

Anaconda

► Q&A

問：MediaPipe支持哪些程式語言？

答：支持JavaScript、Python、Java和C++等程式語言，Github提供JavaScript、Python、JavaScript及Android範例。

問：MediaPipe可以運行在哪些設備上？

答：嵌入式平臺（例如樹莓派等）、移動設備（iOS或Android）或後端伺服器上。

問：MediaPipe提供了哪些領域的機器學習解決方案？

答：視覺、語音和自然語言等領域的機器學習解決方案，例如手部追蹤、人臉檢測、物體檢測、語音識別、文本分析等。

問：MediaPipe是如何實現快速和靈活的機器學習流程的？

答：基於MediaPipe Framework構建的，用於組合多模態模型、ML和非ML處理、硬件加速的流程協調框架。

問：MediaPipe有哪些成功的案例或應用？

答：例如YouTube、Google Lens、Google Home和N

★博文內容均由個人提供，與平台無關，如有違法或侵權，請與網站管理員聯繫。

★文明上網，請理性發言。內容一周內被舉報5次，發文人進小黑屋喔~

參考來源

評論