TY - GEN
T1 - LMFSNet
T2 - 2025 International Conference on Virtual Reality and Visualization, ICVRV 2025
AU - Li, Yang
AU - Qi, Jing
AU - Cui, Zhenchao
AU - Xu, Kun
AU - Ding, Xilun
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Traditional gesture-based human-robot interaction relies on one-to-one gesture-command mapping, requiring numerous gestures and imposing high cognitive load. Existing networks are often computationally heavy, limiting real-time deployment on resource-constrained robots. To significantly reduce the required gestures and improve the intuitiveness of user experience, we develop the Spatial Semantic Mapping framework to change the gesture-based control paradigm by assigning commands based on the spatial position of the hand, establishing a flexible one-to-many mapping. To achieve an optimal balance between accuracy and computational efficiency, we propose a Lightweight Multi-level Fusion Segmentation Network (LMFSNet). Firstly, to reduce computational costs greatly, we propose a lightweight Residual Axial Group Convolution as the core operation of the model. Secondly, to maintain high performance in the lightweight network, we design two modules: Dynamic Adaptive Attention Block (DAAB) and Long-Short Distance Extraction (LSDE) block. Specifically, the DAAB dynamically reweights features to focus on important information, and the LSDE effectively captures and fuses multi-scale features. Experimental results show that the proposed LMFSNet achieves state-of-the-art accuracy while maintaining real-time speed and a compact model size.
AB - Traditional gesture-based human-robot interaction relies on one-to-one gesture-command mapping, requiring numerous gestures and imposing high cognitive load. Existing networks are often computationally heavy, limiting real-time deployment on resource-constrained robots. To significantly reduce the required gestures and improve the intuitiveness of user experience, we develop the Spatial Semantic Mapping framework to change the gesture-based control paradigm by assigning commands based on the spatial position of the hand, establishing a flexible one-to-many mapping. To achieve an optimal balance between accuracy and computational efficiency, we propose a Lightweight Multi-level Fusion Segmentation Network (LMFSNet). Firstly, to reduce computational costs greatly, we propose a lightweight Residual Axial Group Convolution as the core operation of the model. Secondly, to maintain high performance in the lightweight network, we design two modules: Dynamic Adaptive Attention Block (DAAB) and Long-Short Distance Extraction (LSDE) block. Specifically, the DAAB dynamically reweights features to focus on important information, and the LSDE effectively captures and fuses multi-scale features. Experimental results show that the proposed LMFSNet achieves state-of-the-art accuracy while maintaining real-time speed and a compact model size.
KW - Hand Gesture Segmentation
KW - Human-Robot Interaction
KW - Lightweight Segmentation Network
UR - https://www.scopus.com/pages/publications/105035374941
U2 - 10.1109/ICVRV67992.2025.00056
DO - 10.1109/ICVRV67992.2025.00056
M3 - 会议稿件
AN - SCOPUS:105035374941
T3 - Proceedings - 2025 International Conference on Virtual Reality and Visualization, ICVRV 2025
SP - 282
EP - 288
BT - Proceedings - 2025 International Conference on Virtual Reality and Visualization, ICVRV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 December 2025 through 21 December 2025
ER -