TY - GEN
T1 - Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing
AU - Ying, Wangyang
AU - Wang, Dongjie
AU - Liu, Kunpeng
AU - Sun, Leilei
AU - Fu, Yanjie
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Feature generation aims to generate new and meaningful features to create a discriminative representation space. A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities. We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation. To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach. We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.
AB - Feature generation aims to generate new and meaningful features to create a discriminative representation space. A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities. We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation. To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach. We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.
KW - Feature Generation
KW - Hierarchical Reinforcement Crossing
KW - Self-optimizing
UR - https://www.scopus.com/pages/publications/85185395008
U2 - 10.1109/ICDM58522.2023.00084
DO - 10.1109/ICDM58522.2023.00084
M3 - 会议稿件
AN - SCOPUS:85185395008
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 748
EP - 757
BT - Proceedings - 23rd IEEE International Conference on Data Mining, ICDM 2023
A2 - Chen, Guihai
A2 - Khan, Latifur
A2 - Gao, Xiaofeng
A2 - Qiu, Meikang
A2 - Pedrycz, Witold
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Data Mining, ICDM 2023
Y2 - 1 December 2023 through 4 December 2023
ER -