Multimodal Spatiotemporal Feature-Based Human Motion Pattern Recognition With CNN-Transformer-Attention Framework

Research output: Contribution to journalArticlepeer-review

Abstract

Human motion pattern recognition plays a crucial role in applications, such as navigation and positioning, health monitoring, disease prevention, smart healthcare, military security, and human–computer interaction. However, existing recognition methods often face challenges, such as reliance on a single sensor type, limited data richness, inadequate spatiotemporal feature extraction, and suboptimal algorithm structures, which result in low-detection accuracy, limited recognition categories, and poor algorithm robustness. To address these limitations, we propose a novel human motion pattern recognition method based on a convolutional neural network (CNN)-transformer-attention framework that leverages multimodal spatiotemporal features. We first developed a lightweight, cost-effective wearable system capable of real-time data collection from multiple body parts (wrist, chest, and foot) using accelerometers, gyroscopes, and magnetometers. The sensor data from different body parts were temporally synchronized using interpolation, filtered to reduce noise, and analyzed in the frequency domain to extract precise and useful multimodal spatiotemporal features. Subsequently, we designed a CNN-transformer-attention framework that integrates multimodal spatiotemporal information enhancement strategies for accurate motion pattern detection. Experimental results demonstrate that the collaborative enhancement of multibody sensor data achieves a motion pattern recognition accuracy exceeding 98%, outperforming single-sensor systems (wrist, chest, and foot-mounted devices) by 7.98%, 0.43%, and 5.59%, respectively. Furthermore, when compared to models, such as BPNN, CNN, LSTM, Transformer and CNN-BiGRU, our proposed framework exhibits superior accuracy and generalization capabilities.

Original languageEnglish
Pages (from-to)43883-43895
Number of pages13
JournalIEEE Internet of Things Journal
Volume12
Issue number20
DOIs
StatePublished - 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Internet of Things (IoT)
  • convolutional neural network (CNN)-transformer-attention
  • motion pattern recognition
  • multimodal spatiotemporal features
  • temporal interpolation synchronization

Fingerprint

Dive into the research topics of 'Multimodal Spatiotemporal Feature-Based Human Motion Pattern Recognition With CNN-Transformer-Attention Framework'. Together they form a unique fingerprint.

Cite this