1 Introduction

Exoskeleton robots are mechanical wearables featuring sensors, motors, and control systems that allow them to interact with the human body. Its primary purposes are to reduce the load on the user, support them, safeguard their joints, increase their strength, and facilitate walking. It is extensively utilized in the medical, rehabilitative, military, and industrial domains to provide users with a safer and more effective work and life experience. Exoskeleton robots may be roughly divided into three categories: assist exoskeleton robots [1, 2], weight-bearing exoskeleton robots [3, 4], and rehabilitation exoskeleton robots [5, 6]. Lower limb-assisted exoskeletons that can improve human mobility by supplying adequate external assistive moments at human lower limb joints, preventing human injuries, lowering human energy consumption, and satisfying the human body’s needs for mobility and support [7]. To realize cooperative motion with the human body, exoskeleton robots need to be able to sense and determine the motion state of the human body, and adjust its own mechanical structure and movements to provide appropriate assistance and support. To achieve effective human-machine synergy, sensing technologies, motion recognition algorithms, and real-time control systems are used to ensure that the exoskeleton can harmonize with the user’s natural movements. The key technology for lower limb exoskeletons is the ability to recognize the wearer’s motion intent and control the exoskeleton’s motion based on the perceived human motion intent [8]. Recognizing human motion intent is a prerequisite and fundamental requirement for exoskeletons to aid human movement. Therefore, the human motion intention must first be estimated to ensure that the exoskeleton does not obstruct human motion [9, 10].

The human body is usually assisted by adjusting the torque of wearing the exoskeleton based on gait phase prediction. Accurate recognition of gait phase is required to drive the lower limb exoskeleton robot more in line with the human movement state, to achieve the effect of human-machine coordinated movement, and to ensure the safety of the wearer. As a result, gait phase prediction research is critical.

The human walking process is a continuous and regular periodic movement. A complete gait cycle can be divided into the stance and swing phases. The stance phase primarily comprises heel strike, toe-off, and heel-off, constituting approximately 60% of the entire gait cycle, while the swing phase makes up about 40% of the cycle. The division into these two phases is relatively straightforward, but it may potentially reduce recognition precision and does not allow for accurate control of exoskeleton systems. The most widely used method is based on a four-phase recognition technique, which includes heel-strike (HS), toe-strike (TS), heel-off (TO), and swing-phase midpoint (SM). The detailed division can convey more adequate and effective information, allowing the exoskeleton system to recognize gait phases and achieve precise control. Gait phase prediction was performed using a range of wearable sensors including foot switches, pressure insoles, and IMUs [11,12,13]. And the use of inertial measurement units (IMUs) to create motion sensing systems for exoskeletons is becoming more common.

However, these existing gait phase recognition methods do not explicitly consider the interrelations among joints, nor do they effectively explore the spatial and temporal patterns. Besides, general temporal convolution is difficult to capture periodic information of time series by local convolution operation through sliding window. In addition, different channels of gait data collected by IMUs contribute differently to gait phase prediction.

In order to solve the above problems, Auto-Correlation and Channel Attention enhanced Deep Graph Convolutional Networks is proposed for gait phase recognition, which builds spatial temporal graph convolutional networks based on human skeleton to explore the relationships among the joints in both the spatial and the temporal dimension. Auto-Correlation layer is added before establishing spatial and temporal convolution to capture features in time sequence in a more comprehensive way. Since the input data are organized into multi-dimensional tensors with different channels, the addition of channel attention enables the model to focus more on channels with high contribution to gait phase recognition. The proposed gait phase prediction model that can capture periodic temporal features more accurately and handle multi-channel input data more efficiently, is able to effectively recognize human motion intentions for exoskeleton.

The main contributions of our work can be included as follows: (1) We developed a human gait data acquisition equipment based on IMUs and created gait dataset of human walking for gait phase prediction. (2) We proposed a skeleton-based Auto-Correlation and channel attention enhanced Deep Graph Convolutional Networks to predict gait phase. (3) The proposed model achieves average prediction accuracies of  92.26% and 97.21% in user-independent and user-dependent experiments, respectively, which is better than other five mentioned algorithms.

The rest of this paper is organized as follows: Section 2 describes different methods for gait phase recognition. Section 3 explains the dataset and the proposed ACCA-DGCN method. Section 4 presents the experiments and results while Section 5 contains the discussion of the results. Section 6 contains conclusion and future works.

2 Related Works

There are three types of classification algorithms used for gait phase detection: Threshold-Based Methods (TBM), Machine Learning Methods (MLM) and Deep learning Methods (DPM).

The TBM classifies gait phases based on manually setting the thresholds of features from raw data [14]. Kim et al. [15] proposed a Foot–Ground Contact Detection (FGCD) algorithm based on an Inertial Measurement Units (IMUs). Bejarano et al. [11] presented an adaptive algorithm for detecting gait events based on inertial and magnetic sensors. This algorithm performed well in both healthy people and patients with gait pathology. Seel et al. [16] proposed a method for detecting gait events in real time using accelerometer and gyroscope sensors. The algorithm was based on setting precise foot acceleration and angular velocity thresholds for each gait event.

With the development of machine learning and deep learning, some related methods have gradually been applied in the medical field. Svendsen et al. [17] created a dataset comprising 24,300 images of 27 Norwegian Sign Language letters for training machine learning model, which significantly enhancing communication accessibility for the deaf and hard-of-hearing. Ibrahim et al. [18] discussed artificial intelligence (AI) approaches to Parkinson’s disease diagnosis comprehensively, including many deep and machine learning-based methods that have been deployed. These machine and deep learning methods have greatly facilitated research in the medical field.

In the aspect of gait recognition, machine and deep learning method is also widely used. Compared to TBM algorithms, data-driven machine learning algorithms perform significantly better, which use a large amount of data for the classification task, reducing the need to manually create meaningful features. Several machine learning methods, including K-Nearest Neighbor [19], Decision Trees [20], Bayesian Network Classifiers [21], and Linear Discriminant Analysis [22], have been widely used in gait phase prediction applications and are equally suitable for distinguishing gait phases [23]. By learning and modelling on the training data, these machine learning methods can learn patterns and features of gait phase from the input data. The trained models can be used to classify and recognize gait phases on new gait data to determine the current gait phase. Deep learning is a new research direction in the field of machine learning, which has produced numerous results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization techniques, and other related fields. Su et al. [24] proposed a Deep Convolutional Neural Network (DCNN) for recognizing the five phases of the gait cycle that is based on IMU sensors and plantar pressure sensors. The DCNN is able to automatically extract and learn the features of the different phases of the gait cycle by training the model on a large amount of training data, resulting in accurate recognition of gait phases. Furthermore, Wu et al. [25] proposed a Graph Convolutional Network Model (GCNM) for gait phase classification and compared it to Long Short-Term Memory (LSTM) and DCNN. The experimental results showed that GCNM had the highest accuracy in gait phase classification, proving the model’s effectiveness.

Su et al. [26] proposed a long short-term memory (LSTM)-based network, a modified version of recurrent neural networks, which can learn order dependence in sequence prediction problems. The algorithm proposed by Su incorporates a weighted discount loss function that places more weight in predicting the next 3–5 time frames while also enhancing the overall predictive performance for up to 10 time frames. Zhang et al. [27] proposed an integrated network model SBLSTM that combined sparse autoencoder (SAE), bidirectional long short-term memory (BiLSTM) and deep neural network (DNN) aiming at gait phase recognition during human movement.

Table 1 shows the performance of the different deep learning algorithms mentioned above.

Table 1 Performance of different deep learning algorithms

However, these gait phase prediction methods do not effectively utilize the interconnection among different joint motion data in spatial and temporal dimensions to extract the characteristics of human gait phases.

Graph Convolutional Networks (GCN) are a subclass of graph neural networks [28] that are inspired by Convolutional Neural Networks (CNN), but in GCN the convolution operation is extended from regular lattices to graph structures. ST-GCN [29], a spatio-temporal convolutional network-based algorithm, has been widely used in the field of video-based action recognition. This method detects the human skeleton using OpenPose or other pose estimation methods, and then builds spatio-temporal convolutional networks for action recognition based on the human skeleton model. In video-based action recognition, ST-GCN can capture the relationships among joints to learning both the spatial and temporal patterns.

Deep Graph Convolutional Networks (DGCN) [30]was developed for wind speed prediction. DGCN algorithm treated weather stations as graph nodes, where graph convolutional networks were constructed by adding self-cycling connections to the learnable adjacency matrix, and temporal convolutional networks were constructed based on historical data. DGCN model  achieved more accurate predictions than previously developed baseline models.

Incorporating channel attention mechanism into convolutional blocks has recently piqued the interest of many researchers and shows great promise in terms of performance improvement. SE-Net [31] proposed and demonstrated an efficient mechanism for learning channel attention for the first time. Efficient Channel Attention, proposed by Wang et al. [32], is an efficient channel attention mechanism used in deep convolutional neural networks. Wu et al. [33] proposed an Auto-Correlation mechanism for long-term sequence prediction. Auto-Correlation discovers cycle-based relationships, gathers related subsequences from the underlying cycles, and enables asymptotic decomposition of complex time series.

The proposed ACCA-DGCN method applies Auto-Correlation and channel attention mechanisms to strengthen the capability of DGCN in capturing temporal and spatial features, and handling complex multi-channel time series.

3 Methods and Data

3.1 Exoskeleton Robot for Lower Limb Assistance

Figure 1 depicts the schematic structure of the lower limb-assisted exoskeleton robot studied in this paper. The exoskeleton robot is designed with a rigid mechanical support structure, and IMU sensors are placed in the thighs, legs and waist of the exoskeleton robot to collect the human body’s gait motion information, laying the data foundation for human gait phase prediction. In addition, motors of the exoskeleton robot are arranged in the hip joints, acting as joint actuators, with the purpose of providing additional assisting torque to the knee joint.

Fig. 1
figure 1

Robot with lower limb-assisted exoskeleton

3.2 Human Gait Data Collection Equipment

A human gait data collection equipment is developed to collect human gait data for gait phase prediction. It is shown in Fig. 2. The equipment is made up of a flexible ligature structure at the legs and waist that is outfitted with IMU sensors to collect motion data, such as three-axis angle, three-axis acceleration, and three-axis angular velocity. A co-processor is placed at the waist and it is capable of integrating and sending data to a mobile device via Bluetooth. This collection equipment can provide consistent experimental data for gait phase prediction.

Fig. 2
figure 2

Human gait data collection equipment

3.3 Collection of Human Gait Data

To recognize gait phase, gait data of human walking are collected using the collection equipment of human gait data. Four volunteers were invited to participate in gait data acquisition of walking with frequency of 100 Hz, and a total of five sets of data were collected, referred to as dataset 1, 2, 3, 4, and 5. Dataset 1, 2, 3, and 4 contained gait data from four different volunteers, whereas dataset 5 contained gait data from the same volunteer twice (re-wearing human gait data acquisition equipment). The description of gait dataset 1–5 is showed in Table 2, where 20 s of data was collected for each volunteer and the number of samples was 2000. Each dataset contains three-axis angles, three-axis accelerations, and three-axis angular velocities of the thigh, calf and waist, which serve as the foundation for the following experiments.

Table 2 Description of gait datasets

The human body coordinate system is defined as shown in Fig. 3, with the reference side of the human body as the X-axis, the direction of motion parallel to the sagittal plane of the human body as the Y-axis, and perpendicular to the sagittal plane of the human body as the Z-axis.

Fig. 3
figure 3

Schematic diagram of human body coordinate system of IMUs

3.4 Gait Phase Prediction Based on ACCA-DGCN

The experimental process of the ACCA-DGCN algorithm in this paper is depicted in Fig. 4, which includes collecting gait data, constructing the human skeleton model and spatio-temporal graph, building recognition model of gait phase, and finally predicting the gait phase.

Fig. 4
figure 4

ACCA-DGCN flowchart

The following steps are involved in the work on gait phase prediction: first, gait data of walking are collected by wearing collection equipment of human gait data. The gait data are then used to create a skeleton model of lower limb and spatio-temporal graph convolutional networks for gait phase prediction. Following that, experiments are carried out to determine the optimal window size for improving the recognition accuracy of ACCA-DGCN algorithm. Finally, user-independent and user-dependent experiments are performed and the proposed algorithm is compared with other gait phase prediction algorithms to assess the ACCA-DGCN algorithm’s performance. This procedure ensures the integrity of the gait data collection, model development, and experimental validation, with the goal of improving the robustness and applicability of gait phase prediction model.

Based on the natural connection relationship of human joints corresponding to five IMUs, the skeleton graph for gait phase prediction is constructed. The skeleton sequence is then constructed as a spatio-temporal graph and input into the ACCA-DGCN model. As shown in Fig. 5, five IMUs are worn on the waist (1), left thigh (2), right thigh (3), right calf (4), and left calf (5), and the skeleton model is established for gait phase prediction according to the natural connectivities of the human lower limb structure.

Fig. 5
figure 5

IMU placement

In terms of data preparation for model input, IMU data are organized as input data of the ACCA-DGCN model to better perform the operation of spatial and temporal convolution. The input of the model has a shape of (C, V, T), where C represents the number of channels (different types of data collected by IMUs), V represents the number of vertices (human joints) and T represents timestep (window size). The output of the model has a shape of (M, N), where M is the number of gait phases, which is set to 4 in experiments, and N is the number of future time frames. In the experiments, gait data of five joints with angles, angular velocities, and accelerations along the x, y, and z axes were used as input data with nine different channels, and an appropriate window size was chosen.

Graph Convolutional Network (GCN) is a subclass of graph neural networks [8] inspired by Convolutional Neural Networks (CNN), but in GCN the convolution operation is extended from normal grids to graph structures. GCN, in particular, captures the dependence of joints via message passing between the nodes of graphs, allowing us to understand and exploit links between nodes in graph.

In spatial convolution, a graph structure is used to stand for the skeleton model and use adjacency matrix \(\widehat{A}\) as a expression of graph [30]. The adjacency matrix is represented by \(\widehat{\text{A}}\), then the diagonal node degree matrix \(\widehat{D}\) is computed based on the normalized matrix: \({\widehat{D}}_{ii}=\sum_{j}{\widehat{A}}_{ij}\). Next, the symmetric normalization is used: \({\widehat{D}}^{-\frac{1}{2}}\widehat{A}{\widehat{D}}^{-\frac{1}{2}}\). \({X}_{\text{in}}\) is used to represent the initial data and \({X}_{\text{out}}\) is used to represent the data after the graph convolution, which is the reconstruction after multiplying \({X}_{\text{in}}\) and \(\widehat{A}\). The calculation of \({X}_{\text{out}}\) is shown as:

$${X}_{\text{out}}={X}_{\text{in}}\left({\widehat{D}}^{-\frac{1}{2}}\widehat{A}{\widehat{D}}^{-\frac{1}{2}}\right).$$
(1)

By multiplying the adjacency matrix with the input data, coupled with the learnable characteristics of the adjacency matrix, spatial and temporal features are extracted from the skeleton sequence. We use the 0 and 1 in the adjacency matrix to indicate whether the graph vertices are connected. At the beginning of training, the adjacency matrix is set according to the natural connection of skeleton. The adjacency matrix is set to be learnable. In the process of training, the values in the adjacency matrix will be between 0 and 1, which means the spatial convolutional neural networks can self-learn the spatial connections of joints in the skeleton graph. In other words, by making the adjacency matrix learnable, the graph topology can be dynamically altered during training to better reflect the gait data’s characteristics, thereby improving the performance of the model.

In temporal convolutional aggregation, the temporal neighbor information of the nodes in the network is used to create a more comprehensive representation of temporal features [30]. Information from the next and/or prior time step is given for each individual node and its function. The temporal convolution is implemented as a standard two-dimensional convolution with k×1 filter size. The evolutionary tendency of each node in the current time step can be easily captured as well as the preceding and next time steps by employing a convolution filter of size 3 × 1. This temporal convolution architecture enables the model to better utilize information in the temporal dimension.

ACCA-DGCN is composed of Graph Convolution Network and Temporal Convolution Network (GCN_TCN) blocks and various functional layers. GCN_TCN is a temporal graph convolution block that comprises of an Effective Channel Attention layer, a GCN block (Graph Convolution Network), a TCN block (Temporal Convolution Network), and numerous functional layers. Figure 6 depicts the general structure of the GCN_TCN network.

Fig. 6
figure 6

Overall structure of the GCN_TCN network

Figure 7 depicts the total network architecture of this paper. The overall neural network architecture employs ten GCN_TCN blocks and includes an Auto-Correlation Layer before GCN_TCN blocks. The model’s input consists of nine channels, and after stacking ten graph-time convolutional blocks, the number of output channels before flattening is reduced from 64 to 2 by the Conv layer. A Fully Connected (FC) layer at the network’s end is built to forecast the gait phase. During the training phase, the batch size is set to 64, the Adam optimizer is used and the learning rate is set to 0.001. The size of input dataset time steps (T) is set to 90. The output data has a shape of (4, N), where N represents the number of future time frames to predict, which is set to 10. This architecture is intended to efficiently capture temporal and spatial information in gait data to improve the accuracy of gait phase prediction by introducing an Auto-Correlation layer, Efficient Channel Attention, and a graph temporal convolutional block, as well as the appropriate channel processing.

Fig. 7
figure 7

Overall architecture of the network model

3.5 Auto-Correlation and Channel Attention Mechanisms

Auto-Correlation and channel attention mechanisms have been demonstrated to be useful techniques of improving neural network performance. Auto-Correlation computes sequence auto-correlation and captures similar subsequences across temporal delays to uncover cycle-based correlations [33]. Because of the AC’s inherent sparsity and cascading representation of subsequences, computing efficiency and information utilization is both improved. ECA is a lightweight channel attention module that is particularly useful for deep convolutional neural networks, which assists the network in focusing more on the channels relevant to the task by introducing an attention mechanism on the channel dimension, which enhances feature representation. Gait data are time series with periodicity, and different channels of gait data contribute differently to gait phase prediction. Therefore, in order to efficiently capture the periodicity of time series and better handle multi-channel gait data, Auto-Correlation and Effective Channel Attention are added to the DGCN to enhance the performance of gait phase prediction.

The Auto-Correlation Function (ACF) measures how a random signal correlates with itself over different time intervals. Essentially, ACF performs a “cross-correlation” on the signal with itself, highlighting the correlation of the sequences at various time points. By utilizing the inherent meaning of the ACF, it can be employed to detect repeating patterns within the signal, such as identifying the period of a periodic signal hidden within noise. Auto-Correlation Function can be written as:

$$\begin{array}{*{20}c} {R_{x,x} \left( n \right) = \mathop \sum \limits_{n = - \infty }^{ + \infty } x\left( m \right)x\left( {m + n} \right)} \\ \end{array} ,$$
(2)

where \({R}_{x,x}\left(n\right)\) is the autocorrelation coefficient, \(x\left(m\right)\) and \(x\left(m+n\right)\) represent the signal value at time \(\left(m\right)\) and time \(\left(m+n\right)\).

Since the data input for gait phase prediction is a multi-channel time series, extracting features by convolution in temporal dimension is an important step in gait phase prediction. However, general temporal convolution is difficult to capture periodic information of time series by local convolution operation through sliding window. Auto-Correlation can be used to measure how similar a time series is to itself with different time lags. Gait data are usually periodic, and each phase of the gait cycle is repeated within a fixed time interval. Therefore, Auto-Correlation is added in the DGCN model to effectively capture and quantify periodic features of gait data.

ECA is an excellent channel attention mechanism model with fast computation speed and high effect. The principle of ECA is to average the input data, and perform one-dimensional convolution operation on the channel description obtained from the global average pooling, where the convolution kernel size k is obtained through adaptive learning [32]. Local relationships between channels are captured using one-dimensional convolution to generate attention weights for each channel (Fig. 8).

Fig. 8
figure 8

Schematic of Efficient Channel Attention module [32]

In the process of collecting gait data, IMUs can capture the triaxial angle, triaxial angular velocity and triaxial acceleration, which are used to identify gait phase. However, the sagittal plane data can better reflect the information of human lower limb gait among the triaxial data. Therefore, we added channel attention mechanism to the DGCN model to assign different weights to different channels, so that the model can pay more attention to the sagittal plane data and also extract the features from other channels.

4 Experimental Results

4.1 Experimental Setup

The effect of window size on ACCA-DGCN algorithm performance was tested on gait data of walking, comparing with DGCN and four deep learning algorithms, such as CNN, recurrent neural networks (RNN) [34], temporal convolutional networks (TCN) [35], and LSTM [36], to demonstrate the superiority of the ACCA-DGCN algorithm for gait phase prediction. This section conducts two types of experiments to validate the performance of the gait phase prediction model based on ACCA-DGCN. (1) user-independent experiments: the six algorithms are compared using the gait data of three volunteers as the training set on the basis of datasets 1, 2, and 3, and the gait data of another volunteer 4 as the test set. (2) User-dependent experiments: the six algorithms are compared using the gait data of dataset 3 as a training set and dataset 5 (this volunteer after re-wearing) as a test set.

In this paper, the training and testing platform of the gait phase prediction model was built on a Windows 10 operating system, mainly implemented using Python and Torch. Table 3 shows the specific configuration information of the experimental environment.

Table 3 Experimental environment configuration

For the experiments, the performance of gait phase prediction algorithms was evaluated using four evaluation metrics: accuracy, precision, recall, and F1 score, as shown in Eqs. (3)–(6); where TP and TN are the true positives and true negatives, and FP and FN are the false positives and false negatives.

$$\text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{FN}+\text{TN}+\text{FP}}\times 100\text{\%},$$
(3)
$$\text{Precision}=\frac{\text{TN}}{\text{TN}+\text{FP}}\times 100\text{\%},$$
(4)
$$\text{Recall}=\frac{\text{TP}}{\text{TP}+\text{FN}}\times 100\text{\%},$$
(5)
$${F}_{1}-\text{Score}=\frac{2\times \text{Precious}\times \text{Recall}}{\text{Precious}+\text{Recall}}\times 100\text{\%}.$$
(6)

4.2 Window Size Experiment

The size of the window has a substantial impact on the experimental results while doing data windowing. In a certain range, when the window size is properly increased, more periodicity and correlation of gait phase data can be included in the window to improve the recognition ability of phase features. Experiments were conducted on the effect of window size on the performance of the ACCA-DGCN algorithm while keeping other parameters constant in order to determine the optimal window size. Given that the IMU sensors employed in this experiment has a sampling frequency of 100 Hz, a normal person walks between 90 and 120 steps per minute, and a gait cycle lasts about 1 s. As a result, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, and 180 window sizes were chosen. Dataset 3 was used as the training set and dataset 5 as the test set, with the shift size set to 1. In the experiment, 90% of the data for training and 10% for validation were used, executing 500 training sessions each time, and set the learning rate to 0.0001. The performance of ACCA-DGCN with different window sizes is shown in Table 4.

Table 4 Performance of ACCA-DGCN with different window sizes

The performance of ACCA-DGCN is affected by different window sizes, which perform best when the window size is 90, with an average accuracy of 97.21%, and worst when the window size is 80, with an average accuracy of 94.59%, respectively. When the window size is 80, 100, 110, 120, 130, 140, 150, 160, 170, and 180, the average accuracy is 2.62%, 0.44%, 0.78%, 0.92%, 1.39%, 1.69%, 0.63%, 0.91%, 1.04%, and 1.19% lower than the window size of 90.

4.3 User Independent Experiment

The ideal window size setting was established from the above experiments. In this section, ACCA-DGCN algorithm will be compared to other algorithms in order to demonstrate the superior performance of the gait phase prediction of ACCA-DGCN algorithm. As a result, in the experiments, the original three-axis angles, three-axis accelerations, and three-axis angular velocities will be used as inputs to the five algorithms, CNN [22], RNN, TCN, and LSTM [24], and at the same time, the joint coordinates and skeleton data as inputs to ACCA-DGCN algorithm was used for comparison in order to demonstrate the superior performance of ACCA-DGCN algorithm.

Datasets 1, 2, and 3 are utilized as the training set, dataset 4 as the test set, and the window size is set to the previously determined optimal value of 90 in experiments for this section. The experimental results are displayed in Table 5. The superior performance of ACCA-DGCN model for gait phase prediction was proved by conducting comparison experiments with various methods. 

Table 5 User-independent experimental results

Table 5 demonstrates that, among the six algorithms, the average accuracy of the ACCA-DGCN algorithm is 0.61%, 6.36%, 1.81%, 1.21%, and 1.51% higher than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average precision of the ACCA-DGCN algorithm is 1.77%, 10.26%, 3.21%, 1.87%, and 2.1% higher than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average recall of the ACCA-DGCN algorithm is 0.2%, 4.45%, 1.26%, 0.56%, and 1.66% higher than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average F1 score of the ACCA-DGCN algorithm is 1.55%, 11.19%, 3.05%, 1.81%, and 2.03% higher than CNN, RNN, TCN, LSTM, and DGCN, respectively.

4.4 User Dependent Experiment

In this section, user-dependent experiments are carried out with dataset 3 serving as the training set and dataset 5 serving as the test set, and the ACCA-DGCN is compared to the other five algorithms, with the results displayed in Table 6.

Table 6 User-dependent experimental results

According to Table 6, among the six algorithms, the average accuracy of ACCA-DGCN algorithm is 2.64%, 20.86%, 2.93%, 0.27%, and 0.45% high than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average precision of the ACCA-DGCN algorithm is 4.75%, 24.93%, 4.39%, 0.48%, and 0.49% higher than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average recall of the ACCA-DGCN algorithm is 3.94%, 22.33%, 3.91%, 0.59%, and 0.69 higher than CNN, RNN, TCN, LSTM, and DGCN, respectively; the average F1 score of the ACCA-DGCN algorithm is 4.39%,24.16%, 4.17%, 0.53%, and 0.59 higher than CNN, RNN, TCN, LSTM, and DGCN, respectively.

5 Discussion

5.1 Performance Analysis of Models with Different Window Sizes

The performance of ACCA-DGCN algorithm is influenced by different window sizes. As can be seen in Table 4, ACCA-DGCN performs the worst at a window size of 140, with an average accuracy of 95.52%, while it performs the best at a window size of 90, with an average accuracy of 97.21%. The recognition accuracy of each gait phase under different window sizes was evaluated, and the results were plotted to more fully analyze the specific effect of window sizes on the recognition accuracy of gait phase (shown in Fig. 9). The ACCA-DGCN maximum identification accuracy in all gait phases is attained at a window size of 90. This implies that selecting an appropriate window size is critical for the performance of ACCA-DGCN algorithm. The model with a window size of 90 is able to effectively capture the spatio-temporal patterns of gait phasess implies that selecting an in this experiment, which helps to increase overall accuracy. These results demonstrate the sensitivity of window size selection to algorithm performance and provide reference for the appropriate window size in gait phase prediction.

Fig. 9
figure 9

Accuracy of each gait phase of the ACCA-DGCN models with different window sizes

As shown in Fig. 9, recognition accuracy of ACCA-DGCN for the HS (Heel Strike) phase have been maintained at a high level under different window sizes, which is usually above 96% and is not basically affected by changes in window sizes. In contrast, the two gait phases TS (Toe Strike) and TO (Tiptoe Off) are affected to some extent by the change in window sizes, with the lowest recognition accuracies being 93.30% and 96.32%, respectively, but their average recognition accuracies being over 93%. As a result, different window sizes have less influence on the recognition accuracy of the three gait phases of HS, TS, and TO of ACCA-DGCN, and the majority of the gait phase prediction accuracies are above 95%.

However, the window size has a significant impact on the SM (Swing Middle) phase. It has a highest recognition accuracy of 93.33%, corresponding to a window size of 100, and a lowest recognition accuracy of 68.10%, corresponding to a window size of 80. This implies that the performance of ACCA-DGCN algorithm differs in the selection of window sizes for different gait phases.

The confusion matrices for ACCA-DGCN with window size of 90 and 100 are depicted in Fig. 10. Gait phase prediction accuracy with a window size of 100 has a slightly lower average accuracy, 0.44% lower than that with a window size of 90. When the window size is changed, the SM phase recognition accuracy changes, so when determining the optimal window size, both the average recognition accuracy and the SM phase recognition accuracy should be considered. When the window sizes are 90 and 100, respectively, only the SM phase recognition accuracy of the former is 0.61% lower than that of the latter, while the recognition accuracies of remaining three phases and average recognition accuracy of the former are higher.

Fig. 10
figure 10

The confusion matrix of ACCA-DGCN with different window size (left: 90, right: 100)

Overall, the model with the window size of 90 produced the highest average prediction accuracy of gait phases. The recognition accuracy of SM phase with a window size of 90 was lower than that with a window size of 100, but the recognition accuracy of other three phases and average recognition accuracy were greater. As a result, 90 is chosen as the optimal window size. The appropriate window size is helpful to improve misrecognition of the model.

5.2 Analysis of User Independent Experimental Results

In the user-independent experiments, ACCA-DGCN performed best, with an average recognition accuracy of 92.26%. The recognition accuracy of each gait phase was determined to further assess the performance of each algorithm, as shown in Fig. 11. The accuracies of ACCA-DGCN in recognizing the three gait phases HS, TS, and TO are all above 93% and more stable, while the accuracy in recognizing the SM has a lowest recognition accuracy of 74.57%. Several additional deep learning algorithms achieve good recognition accuracy on gait phase prediction. However, the recognition accuracy of the SM phase is lower than that of the other three gait phases.

Fig. 11
figure 11

The accuracy of four gait phases in different algorithms

ACCA-DGCN had gait phase prediction accuracies of 98.08%, 95.97%, 92.99, and 74.57% for HS, TS, TO, and SM, respectively. CNN and TCN outperform ACCA-DGCN in HS, TS, and TO gait phase prediction. In HS and TO phase recognition, LSTM is slightly better than ACCA-DGCN; while RNN is only slightly better than ACCA-DGCN in TO phase recognition. Compared with DGCN, ACCA-DGCN is better than DGCN in HS, TS and SM phases, but worse than DGCN in TO phase. ACCA-DGCN has the highest average accuracy, and it has higher recognition accuracy than the other five algorithms in recognizing the SM phase. Moreover, the recognition accuracy of the SM phase is 6.53% higher than that of the DGCN algorithm that has the second highest recognition. This result illustrates that ACCA-DGCN outperforms the other algorithms in gait phase prediction.

To further investigate the experimental results of ACCA-DGCN, confusion matrix of gait phases is shown in Fig. 12.  The misrecognition rates of the HS, TS, and TO gait phases are low, while the SM phase has a high misrecognition rate, of which 18.44% and 6.99% are misrecognized as HS and TO, respectively.

Fig. 12
figure 12

Confusion matrix of gait phase prediction based on ACCA-DGCN in user-independent experiment

The experimental results of these methods reveal that the main reason for gait phase misrecognition at the data level is that the acquired data features do not properly convey the distinctions between different gait phases. Three-axis angle, three-axis acceleration, and three-axis angular velocity data of the lower limbs are used in this paper, where the motion characteristics of the limb do not differ much, resulting in data similarity of these gait phases, making it difficult to clearly distinguish them. The ankle joint data is not collected, which may be helpful to further improve the gait phase prediction accuracy. The absence of ankle joint data may limit the effectiveness of the ACCA-DGCN model. Despite the data-level constraints, the average recognition accuracy of the methods differed significantly based on the same dataset, implying that the model’s feature extracting and learning capabilities is also a crucial factor influencing the performance of gait phase prediction. As a result, designing and optimizing algorithm that takes the characteristic of data into account remains an essential path for improving gait phase prediction accuracy.

5.3 Analysis of User Dependent Experiment Results

The gait phase prediction model with good results was successfully created in user-independent experiment. Although the generalization of the model has been confirmed by user-independent experiments, the optimal effect of the model on individual person has not been explored. Therefore, a user-dependent experiment was conducted to analyze gait phase recognition performance at the personalized level.

ACCA-DGCN performs well in user-dependent experiment, with the highest average gait phase prediction accuracy of 97.21%. The accuracy of these six algorithms in the user-dependent experiment is improved compared with the user-independent experiment, indicating that the user-dependent experiment is conducive to improve the average gait phase prediction accuracy. Figure 13 depicts the recognition accuracies of each algorithm on different gait phases. On the gait phases of HS and SM, ACCA-DGCN rated highest among six algorithms, with recognition accuracies of 97.14% and 92.72%; the second highest recognition accuracy is 98.85% and 96.87% on the TO and TS. Compared with DGCN, ACCA-DGCN is better in HS, TS, TO and SM phases. In terms of the average recognition accuracy and the recognition accuracy of each gait phase, ACCA-DGCN performs better and more consistently, and the lowest recognition accuracy is 92.72%, while prediction accuracy of gait phase using RNN even achieves 47.35%. This result suggests that ACCA-DGCN outperforms other algorithms in gait phase prediction.

Fig. 13
figure 13

The accuracy of four gait phases in different algorithms

ACCA-DGCN confusion matrix is shown in Fig. 14, and the gait phase prediction accuracies of ACCA-DGCN in user-dependent experiment are all very high, with a minimum of 92.72% among all four gait phases. In comparison to the user-independent experiment, ACCA-DGCN performs better in user-dependent experiment. Although there are still incidences of gait phase misrecognition, which are kept to a maximum of 5.88%. These results reveal that ACCA-DGCN outperforms the other five algorithms in terms of accuracy and stability in gait phase prediction. Therefore, ACCA-DGCN model is more flexible to individuals’ unique gait phase characteristics, allowing it to function better in practical applications.

Fig. 14
figure 14

Confusion matrix of gait phase prediction based on ACCA-DGCN of user-dependent experiment

5.4 Analysis of Spatial Graph

The adjacent matrices and skeleton graphs of user-independent and user-dependent experiments are shown in Figs. 15 and 16. The average value of the connection weights between neighboring nodes of the adjacent matrix is used as the weight of the edge between the two corresponding nodes in the skeleton graphs. In addition to the diagonal of the adjacent matrix, there are stronger connections at some non-diagonal locations, such as the left knee joint and the right knee joint, the left knee joint and the right hip joint. In the natural connections of the human skeleton, the connections between the waist node, the left and right hip joints, the hip and knee joints of the same leg have relatively large weights. There are important relationships of waist node and left hip joint, waist node and right hip joint, left hip joint and left knee joint, right hip joint and right knee joint in the natural connections of human body. Except for the natural connections of the skeleton, connections between non-physical connected nodes are also added to the adjacent matrix through self-learning, which means that connections between disconnected joints can be established through self-learning of the spatial graph. In terms of user-independent experiment, strong connection was established between the left knee joint and the right knee joint, moderate connection between the left hip joint and the right hip joint, the waist node and the left knee joint, weak connection between the left hip joint and the right knee joint,  and weaker connection between the right hip joint and the left knee joint, the waist node and right knee joint.

Fig. 15
figure 15

Adjacent matrix and skeleton graph of user-independent experiment

Fig. 16
figure 16

Adjacent matrix and skeleton graph of user-dependent experiment

Overall, the results of the adjacent matrices and the skeleton graphs for user-independent and user-dependent experiments are relatively close, and there are small differences in the connection weights between a few joints in the two experiments, such as the connection weights of the left hip joint and the right hip joint, and the right hip joint and the left knee joint. Through the self-learning of spatial graph, the connections beyond natural ones can be learned and the connection weights between joint nodes can be dynamically adjusted. This approach can be applied to different models, improving the generalization performance of the model and aiding in better predicting the human gait phase.

The gait phase prediction model based on ACCA-DGCN performed the best in the user-independent and user-dependent experiments. For four types of gait phase predictions, ACCA-DGCN model performed better than other models.

Most deep learning algorithms, such as convolutional neural networks, recurrent neural networks, long-short-term memory networks, and temporal convolutional networks are weak at learning spatial features and mainly focus on temporal characteristics. These algorithms that take multi-channel signals as input and transform them into grid-structured or sequence-structured data may be difficult to describe spatial correlations in multi-channel signals effectively.

DGCN fail to address the influence of each channel on recognition results and to effectively take account of periodicity and dependencies between subsequences in time series. As a result, this results in lower overall performance than ACCA-DGCN in gait phase prediction.

The performance of ACCA-DGCN can be enhanced by adding AC layer, which computes the auto-correlation of gait subsequences and identifying periodic patterns. Auto-Correlation in the AC layer helps to reduce information loss and grasp the temporal features from a global perspective. ACCA-DGCA model with the addition of ECA is better able to capture critical motion characteristics that affect gait phase prediction by focusing on important channels, thus boosting the capability of feature extraction. ECA mechanism usually have fewer parameters, which improves model performance while decreasing computational cost. Therefore, ACCA-DGCN can automatically extract spatio-temporal patterns and more accurately predict gait phases.

In summary, a novel gait phase prediction model based on ACCA-DGCN using multi-IMU system is proposed in this paper. The Auto-Correlation and Efficient Channel Attention are employed in ACCA-DGCN in order to efficiently capture the periodicity of time series and handle multi-channel data input in a better way.Skeleton based spatio-temporal graph convolutional neural networks are used to accurately distinguish the different gait phases. The ACCA-DGCN model captures the linkages between skeleton joints, promoting a deeper comprehension of these connections. Moreover, the ACCA-DGCN model facilitates accurate and robust learning and representation of the spatio-temporal patterns that exist between distinct gait phases. The appropriate window size was determined, and user-independent and user-dependent experiments were performed. And ACCA-DGCN was compared to the five deep learning algorithms. Gait phase prediction algorithm based on ACCA-DGCN performs the best among the five algorithms in the two experiments, with average recognition accuracies of 92.26% and 97.21%, respectively, which can recognize gait phases more accurately and stably.

In addition, in the graph learning process of ACCA-DGCN algorithm, connections are usually established among joints after self-learning, some of which may result from noise, outliers, or information that is not relevant to the task, affecting the performance of graph learning. And the ACCA-DGCN model uses only five nodes and lack of use of more nodes such as the ankles.

6 Conclusion

Auto-Correlation and Channel Attention enhanced deep graph convolution network for gait phase prediction based on multi-IMU system is proposed in this paper. Auto-Correlation and channel attention mechanisms are introduced in ACCA-DGCN model to effectively capture periodic features of gait data and adaptively highlight channels that contain important features in view of the periodic characteristics of gait phase and the multi-channel of gait data. Firstly, a human gait data collection equipment is developed and gait data of five volunteers are collected. Secondly, a gait phase prediction model based on ACCA-DGCN is established. Thirdly, the effect of the window size on the model performance are explored. Finally, the user-independent experiment and user-dependent experiment are conducted, and the experimental results are compared to those of the other five algorithms. The experimental results suggest that ACCA-DGCN model outperforms the other five algorithms in terms of the performance of gait phase prediction. The accuracy and stability of gait phase prediction can be increased by integrating skeleton information, spatio-temporal patterns, and Auto-Correlation and channel attention mechanisms. The proposed method provides a new method in the field of human gait phase recognition based on multiple IMU systems.

In future work, the data acquisition of left and right ankle joints may be added into the gait data collection equipment, and more gait information can be obtained to improve the accuracy of gait phase recognition.