Validation of a MediaPipe System for Markerless Motion Analysis During Virtual Reality Rehabilitation

Francia, Carlalberto; Motta, Filippo; Donno, Lucia; Covarrubias, Mario; Dornini, Cristina; Madella, Antonia; Galli, Manuela

doi:10.1007/978-3-031-71710-9_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15029))

Included in the following conference series:

International Conference on Extended Reality

338 Accesses
2 Citations

Abstract

Markerless systems and inertial sensors, such as inertial measurement units, offer significant advantages over traditional marker-based methods by eliminating the need for laboratory conditions or markers placement. Virtual reality is an innovative technology used in rehabilitation, recently investigated for obtaining kinematic motion data. This research aims to validate a markerless system using a MediaPipe-based algorithm to reconstruct joint kinematics during exercises with the Nirvana virtual reality rehabilitation system. In the first phase, the markerless system was evaluated against the Xsens inertial measurement unit system, analyzing knee, hip, elbow and shoulder movements of 7 healthy participants at the Politecnico di Milano. The MediaPipe algorithm showed high performance in tracking lower limb movements but was less accurate for upper limbs, as indicated by RMSE and %RMSE values. However, it accurately replicated the time course of all angles based on ICC and Spearman’s coefficient, suggesting virtual reality’s feasibility in rehabilitation and movement analysis. In the second phase, a pilot study was carried out at the Fondazione TOG on 2 participants with left hemiparesis to assess the efficacy of the system in a pathological context. The results provide a basis for evaluating the potential clinical applicability of the system.

Download conference paper PDF

Improving the Joint Centre Location of a Markerless Motion Analysis System Through a Functional Calibration Method: A Pilot Study

Kinect and wearable inertial sensors for motor rehabilitation programs at home: state of the art and an experimental comparison

Article Open access 23 April 2020

A comparison of lower body gait kinematics and kinetics between Theia3D markerless and marker-based models in healthy subjects and clinical patients

Article Open access 25 November 2024

Keywords

1 Introduction

Kinematic motion analysis is characterized by its focus on the precise measurement of rigid body motion, without considering masses and forces, aspects treated in dynamic motion analysis. Today, these assessments are increasingly utilized both in rehabilitation and biomechanics, with the main objective of highlighting how movement, typically of a specific district, varies over time [1]. Depending on specific needs or applications, there are different approaches and technologies for evaluating kinematics. Currently, optoelectronic systems are the standard for obtaining more reliable information about the kinematics of different body segments. They have limitations related to the need to operate in controlled environments and the requirement to use markers attached to the subject’s body, which can cause artifacts [2]. An important upcoming development is represented by markerless and inertial systems. These techniques have the potential to extend the applicability of human motion capture even outside the laboratory. Inertial sensors, such as the inertial measurement unit (IMU), are widely used to obtain information about human motion, integrating data from accelerometers, gyroscopes, and magnetometers contained within the IMUs. The markerless approach to human motion detection uses camera systems and video recordings to observe motor gestures. The videos are then processed by algorithms that identify and link specific points on the body to reconstruct joint kinematic parameters [3]. This method is advantageous compared to inertial sensors because it does not require direct sensor placement on the body or calibration, making it more practical for clinical use and remote monitoring, especially for patients with severe injuries [4]. Current research focuses on validating markerless systems for capturing motion and using it during rehabilitation treatments [5]. Although optoelectronic systems are preferable for validation, the use of IMUs allows for a wider range of applications mantaining data accuracy [6]. Virtual reality (VR) is an innovative technology that is emerging in the field of rehabilitation [7]. It has the potential to be integrated with motion analysis and treat individuals with cognitive and motor difficulties, particularly in severe neurological conditions [8]. The growing interest of clinicians and therapists in obtaining quantifiable data has led to the need to develop systems that facilitate movement execution and allow for quantitative analysis of it.

2 Materials and Methods

This research aims to validate a markerless system for reconstructing joint kinematics during rehabilitation exercises using the Nirvana VR system developed by BTS Bioengineering, Garbagnate Milanese, Milan, Italy. To this end, the movements of 7 healthy participants performing tasks in the Nirvana virtual environment were analyzed to compare the performance of the Xsens IMU system and the MediaPipe Pose Landmarker algorithm. A GoPro Hero 10 was used to record the videos, and the data were extracted using the algorithm to reconstruct the joint angles. Statistical indices were used to compare the data from the two systems. In the second phase, a pilot study was carried out with two participants with left hemiparesis, focusing on assessing differences in range of motion (ROM) and movement patterns to evaluate the clinical applicability of the system.

2.1 Validation Phase

The exercises conducted in this study utilize the Nirvana immersive VR rehabilitation system. The system includes hardware components like two projectors and two Kinect devices, along with software offering a user-friendly interface and a variety of exercises targeting different body parts. It provides specialized audio-visual feedback to aid motor skill recovery during virtual reality sessions. The selection of the two exercises is based on those commonly used with the Nirvana system for upper and lower limb rehabilitation [9, 10]. Two distinct types of exercises were chosen. Walking across the guitar strings assesses lower limb functions during a walking simulation in a virtual environment where participants avoid guitar strings projected on the floor. Wide hip and knee flexion-extension movements are required instead of typical walking motions, focusing the evaluation on the sagittal plane. The participant takes 6 steps forward and 6 steps backward. The other exercise, titled Cleaning the window, evaluates upper limb functions. Participants are tasked with removing virtual water from the window displayed by the wall projector, thereby revealing the underlying image. Each trial consists of 5 flexion/extension movements of the elbow and 5 simultaneous abduction-adduction of the shoulder with the right arm, followed by another 5 with the left arm, both evaluated in the frontal plane. Seven healthy participants (age: 25 (±2) years; height: 175 (±10) cm; weight: 65 (±8) kg) were recorded at the Active3 Laboratory at the Lecco Campus of the Politecnico di Milano while performing the two exercises mentioned above. The study was carried out in compliance with the World Medical Association Declaration of Helsinki and approved by the Ethics Committee of Politecnico di Milano (n. 19/2023, 19/05/2023). Every session described was repeated ten times for each participant both with lower and upper limb for a total of 140 sessions. This phase involved comparing the Xsens system, which utilizes 17 IMUs placed on various body parts, as shown in Fig. 1, with a markerless machine learning (ML)-based system to detect body points and subsequently reconstruct joint angles over time using MATLAB 2020a.

During the acquisition, the Xsens system needs calibration and setup steps. The software is then initiated to record motion using the provided license. Following the recording, a .xlsx file containing comprehensive information, including temporal trends of joint angles calculated in a three-dimensional body coordinate system, is downloaded. The Xsens data, extracted from the .xlsx file, provides temporal evolutions of joint angles, from which specific angles are extracted into MATLAB environment. The markerless method is based on Google’s open source MediaPipe framework, which is designed for constructing ML pipelines, where each pipeline consists of a series of operations or stages. Within MediaPipe, there are implemented Solutions built on top of the framework, providing a set of libraries and tools for quickly applying ML techniques to specific applications [11]. One of them, called MediaPipe Pose Landmarker, has been modified in the Google Colab environment and adapted to extract coordinates of body points by simply processing a multimedia file. The MediaPipe algorithm tracked the temporal 3D coordinates of 33 landmarks. However, for the purpose of this study, only the keypoints of interest, highlighted in green in Fig. 1, were used. For video processing, it was crucial to precisely define and calculate the joint angles to be examined, ensuring they could be compared with those identified by the Xsens system. Four joint angles were considered: knee flexion/extension (KFE) angle and hip flexion/extension (HFE) angle in the sagittal plane, elbow flexion/extension (EFE) and shoulder abduction/adduction (SAA) in the frontal plane. Detailed vectors and calculated angles are depicted in the Fig. 2.

The GoPro Hero10 was placed at a distance of 2 m and at a height of 0.90 m to capture movement and minimize occlusion issues. Video recordings were conducted at a frequency of 60 Hz, as with Xsens software settings, to maintain consistent acquisition frequency without downsampling operations. For the GoPro camera recordings, the Mediapipe Framework is employed. This involves downloading a specific configuration file for the pose detection model (Mediapipe Pose Landmarker), setting up additional libraries for frame extraction and creating a dataframe in a .csv format. The video is then loaded and processed frame by frame, resulting in a .csv file containing body keypoints coordinates over time. Next, both sets of information are imported into a MATLAB script. The data from the GoPro camera was used to generate two vectors in each frame, delineating specific joint angles. These vectors, denoted as $\mathbf {v_1}$ and $\mathbf {v_2}$, served as inputs to calculate the precise joint angles using the Eq. 1:

$$\begin{aligned} \theta =\arccos \left( \frac{\textbf{v1} \cdot \textbf{v2}}{||\textbf{v1}|| \cdot ||\textbf{v2}||} \right) \end{aligned}$$

(1)

A fourth-order Butterworth filter with a cutoff frequency of 4 Hz is applied to filter the data [12]. Data synchronization involves selecting corresponding peaks in the angle trends. Finally, a statistical analysis is carried out to evaluate the validity of the new method in terms of root mean square error (RMSE), Bland Altman (B&A) plot with special attention to the bias value, intraclass correlation coefficient (ICC) and Spearman’s Coefficient ($\rho $) as the non-normality of the data was verified with a Jarque-Bera test at a significance level of 0.05 [13]. For a more accurate and absolute indication of the error across different movements, an additional index has been introduced to normalise the error with respect to the specific ROM of the movement, multiplied by 100, giving the percentage of error (%RMSE).

2.2 Pilot Study

After validating the algorithm and addressing any limitations, the system was tested in a clinical setting during a pilot study. It is important to emphasize that the aim of this phase was not to further test the performance of the algorithm against a more reliable system, but only to evaluate whether the results obtained could provide the basis for a possible extension of the study in a clinical setting and a possible implication in rehabilitative treatment. This study replicated a validation setup without the Xsens system, performing exercises on two post-stroke patients (aged 10 and 16) with partial hemiplegia in their left limbs. Recordings took place at the TOG Foundation, a non-profit organisation of social benefit (Onlus). The goal was to qualitatively evaluate signals by averaging angles over time and comparing the ROM between left and right limbs. Due to fatigue, each participant completed two trials, resulting in a total of four trials for both the upper and lower limbs.

3 Results

3.1 Validation Phase

Table 1. Validation indexes expressed in mean (±standard deviation) for each exercise where r stands for right and l for left.

Full size table

To provide a comprehensive overview, the correlation results are summarized in Table 1. The RMSE and %RMSE values for the different conditions are shown in Fig. 3, categorized by upper and lower limbs. The RMSE values for the lower limb are below $20^{\circ }$, with higher values for the hip joint. For the upper limb, RMSE values of more than $20^{\circ }$ are observed for the elbow joint, while for the shoulder movement it is around $13^{\circ }$. In terms of percentage RMSE, the results are around 15% for the KFE and around 17% for the HFE, while for the upper limb values around 20% are noted for both gestures. The statistical indices show very similar values between upper and lower limb, with values ranging from 0.69 to 0.76 for the ICC and from 0.59 to 0.72 for the Spearman’s coefficient. For the B&A plot analysis, bias has been used as indicator value, showing values that are on average less than $10^{\circ }$ in absolute value. For illustrative purposes, a graphical comparison between the performance of the MediaPipe-based algorithm and the output provided by Xsens is presented in Fig. 4.

Table 2. Mean values (±SD) of the angles between the right and left limbs.

Full size table

3.2 Pilot Study

The mean values of the different angles analyzed are provided for each pathological subject and task. These values were obtained by calculating the average of the specific angle for each subject over time. Table 2 shows only the values averaged between the two subjects and over all trials for the specific angle. In addition, Fig. 5, which compares the pathological limb (Fig. 5a) with the healthy limb (Fig. 5b), is reported as an example specifically for the KFE angle.

4 Conclusion and Future Developments

This study effectively combines VR with markerless motion analysis, overcoming potential interference issues and combining exercise execution. In the validation phase, the results provided by the MediaPipe based algorithm show good fidelity in reproducing the angular trends over time for all 4 gestures analysed. Errors of less than $20^{\circ }$ and around 16% for KFE and HFE reflect consistency with literature data and confirm overall positive algorithm performance despite potential limitations such as interference with the VR system. On the other hand, the results indicate greater difficulties in the acquisition of movements of the upper limbs compared to the lower limbs, as confirmed by the RMSE and %RMSE data. Slightly worse data were found for EFE and SAA. Looking at the absolute RMSE values alone may not be meaningful in this case, given the difference in ROM between the two exercises, but in both cases an error of around 20% is observed, indicating a greater challenge in the acquisition of upper limb data. This is related to the fact that movements in the upper limb are performed at a higher average speed and are often more difficult to record due to occlusion by other parts of the body, as the trunk. The most significant data are the correlation indices, with average values above 0.7 for the ICC and 0.6 for the Spearman’s coefficient, highlighting a consistency in the faithful reproduction of the angular patterns for all four gestures analysed. Both indexes tend to improve during the transition from the distal (KFE and EFE) to the corresponding proximal part (HFE and SAA), which can be linked to the proximal angle construction method which allows a greater reliability. The bias values are interesting as they are low in both cases, indicating a similarity between the signals from the two systems. However, on average, the values are positive for the lower limb and negative for the upper limb, suggesting a change in the system’s behaviour. First, the system slightly overestimated the values for the lower limb, while for the upper limb there is a shift to underestimating the actual value, which could be due to the camera being positioned around hip level. Furthermore, from the point of view of all the analysed indices, the data do not show any significant differences when moving from one side to the corresponding contralateral side, which indicates a strong robustness of the system. With regard to the pilot study, the results were evaluated on the basis of the average angle over time and the graphical representation. The accuracy of the system was confirmed by visual cyclic trends and differences in mean and maximum values between pathological and healthy limbs. The healthy limb demonstrated higher average values and a wider ROM, underscoring the system’s ability to effectively discriminate between limb performances. This represents a significant advance and offers promising opportunities for the development of additional rehabilitative exercises. This study aimed to evaluate the suitability of the algorithm under less controlled conditions, laying the groundwork for future clinical applications. Future research could expand sample sizes and study durations to evaluate the long-term reliability of the system and its effectiveness in monitoring rehabilitation in different conditions. Integrating data from multiple cameras, resolving occlusion, and expanding the perspective beyond one dimension could be an important advancement, giving operators more flexibility in selecting rehabilitative movements. The study overlooked the influence of a fixed camera position on the algorithm’s performance during participants’ movements. Employing a mobile camera for human tracking could yield improvements by addressing occlusion issues. In conclusion, this work highlights the potential of the system, particularly in providing patients with greater freedom of movement in virtual environments. The system also demonstrates reliability in reproducing motor gestures and obtaining kinematic data. The basis has been established to provide an essential tool for clinical rehabilitation; however, for widespread adoption, updates are needed to reduce and ideally eliminate the gap with the gold standards currently used.

References

Borghese, N.A., Bianchi, L., Lacquaniti, F.: Kinematic determinants of human locomotion. J. Physiol. 494(3), 863–879 (1996)
Article Google Scholar
Mündermann, L., Corazza, S., Andriacchi, T.P.: The evolution of methods for the capture of human movement leading to markerless motion capture for biomechanical applications. J. Neuroeng. Rehabil. 3, 1–11 (2006)
Article Google Scholar
Chung, J.L., Ong, L.Y., Leow, M.C.: Comparative analysis of skeleton-based human pose estimation. Future Internet 14(12), 380 (2022)
Article Google Scholar
Armstrong, K., Zhang, L., Wen, Y., Willmott, A.P., Lee, P., Ye, X.: A marker-less human motion analysis system for motion-based biomarker identification and quantification in knee disorders. Front. Digit. Health 6, 1324511 (2024)
Article Google Scholar
Hii, C.S.T., et al.: Automated gait analysis based on a marker-free pose estimation model. Sensors 23(14), 6489 (2023)
Article Google Scholar
Nijmeijer, E.M., Heuvelmans, P., Bolt, R., Gokeler, A., Otten, E., Benjaminse, A.: Concurrent validation of the Xsens IMU system of lower-body kinematics in jump-landing and change-of-direction tasks. J. Biomech. 154, 111637 (2023)
Article Google Scholar
Rose, T., Nam, C.S., Chen, K.B.: Immersion of virtual reality for rehabilitation-review. Appl. Ergon. 69, 153–161 (2018)
Article Google Scholar
Karatsidis, A., et al.: Validation of wearable visual feedback for retraining foot progression angle using inertial sensors and an augmented reality headset. J. Neuroeng. Rehabil. 15(1), 1–12 (2018)
Article Google Scholar
Settimo, C., et al.: Virtual reality technology to enhance conventional rehabilitation program: results of a single-blind, randomized, controlled pilot study in patients with global developmental delay. J. Clin. Med. 12(15), 4962 (2023)
Article Google Scholar
De Luca, R., et al.: Improvement of brain functional connectivity in autism spectrum disorder: an exploratory study on the potential use of virtual reality. J. Neural Transm. 128, 371–380 (2021)
Article Google Scholar
Kim, J.W., Choi, J.Y., Ha, E.J., Choi, J.H.: Human pose estimation using MediaPipe pose and optimization method based on a humanoid model. Appl. Sci. 13(4), 2700 (2023)
Article Google Scholar
Tong, K., Granat, M.H.: A practical gait analysis system using gyroscopes. Med. Eng. Phys. 21(2), 87–94 (1999)
Article Google Scholar
Jarque, C.M., Bera, A.K.: A test for normality of observations and regression residuals. Int. Stat. Rev./Revue Int. Stat., 163–172 (1987)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Meccanica, Politecnico di Milano, Milan, Italy
Mario Covarrubias
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Carlalberto Francia, Filippo Motta, Lucia Donno & Manuela Galli
Fondazione TOG, Milan, Italy
Cristina Dornini & Antonia Madella

Authors

Carlalberto Francia
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Motta
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Donno
View author publications
You can also search for this author in PubMed Google Scholar
Mario Covarrubias
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Dornini
View author publications
You can also search for this author in PubMed Google Scholar
Antonia Madella
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Galli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlalberto Francia .

Editor information

Editors and Affiliations

University of Salento, Lecce, Italy
Lucio Tommaso De Paolis
University of Naples Federico II, Naples, Italy
Pasquale Arpaia
CNR-STIIMA, Lecco, Italy
Marco Sacco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Francia, C. et al. (2024). Validation of a MediaPipe System for Markerless Motion Analysis During Virtual Reality Rehabilitation. In: De Paolis, L.T., Arpaia, P., Sacco, M. (eds) Extended Reality. XR Salento 2024. Lecture Notes in Computer Science, vol 15029. Springer, Cham. https://doi.org/10.1007/978-3-031-71710-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-71710-9_3
Published: 12 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71709-3
Online ISBN: 978-3-031-71710-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics