Skip to main navigation Skip to search Skip to main content

Dual-Branch CNN-MLP-Dropout Network for Multi-Class Scene Recognition: Fusing Multi-Sensor Time-Domain Inputs and Time-Frequency Representations

  • Hong Kong Polytechnic University
  • Beihang University
  • Ministry of Industry and Information Technology
  • State Key Laboratory of CNS/ATM

Research output: Contribution to journalArticlepeer-review

Abstract

In the current digital age, scene recognition technology is increasingly applied in the Internet of Things (IoT) ecosystem, including transportation, smart homes, elderly care, and terrain surveying. Existing technologies have two limitations: sensors are often worn on different body parts, causing inconvenience; and limited sensor types with inadequate data processing restrict recognition accuracy. Consumer-grade smartphones, with high portability, low cost, lightweight design, and diverse sensors, offer an ideal solution and serve as crucial intelligent agents in IoT scenarios. To leverage multi-sensor information for effective, efficient, and trustworthy IoT scene recognition, we propose a CNN-MLP-dropout method using time-domain and time–frequency-domain features. This method synchronizes data from eight built-in heterogeneous sensors, extracts effective features via filtering and Short-Time Fourier Transform (STFT), then feeds them into the CNN-MLP-Dropout model. The model first extracts local patterns, captures global representations, and achieves multi-class scene recognition. Test results on 10 indoor/outdoor scenarios showed multi-sensor data improved accuracy by 21.98%, 36.67%, and 6.09% vs. limited-sensor groups. The proposed model significantly outperforms five state-of-the-art networks, with dropout avoiding overfitting, enabling high-precision, robust, user-friendly multi-class activity scene recognition in IoT.

Original languageEnglish
JournalIEEE Internet of Things Journal
DOIs
StateAccepted/In press - 2026

Fingerprint

Dive into the research topics of 'Dual-Branch CNN-MLP-Dropout Network for Multi-Class Scene Recognition: Fusing Multi-Sensor Time-Domain Inputs and Time-Frequency Representations'. Together they form a unique fingerprint.

Cite this