TY - GEN
T1 - The Simpler the Better
T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
AU - Tong, Yongxin
AU - Chen, Yuqiang
AU - Zhou, Zimu
AU - Chen, Lei
AU - Wang, Jie
AU - Yang, Qiang
AU - Ye, Jieping
AU - Lv, Weifeng
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/8/13
Y1 - 2017/8/13
N2 - Taxi-calling apps are gaining increasing popularity for their eficiency in dispatching idle taxis to passengers in need. To precisely balance the supply and the demand of taxis, online taxicab platforms need to predict the Unit Original Taxi Demand (UOTD), which refers to the number of taxi-calling requirements submitted per unit time (e.g., every hour) and per unit region (e.g., each POI). Predicting UOTD is non-trivial for large-scale industrial online taxicab platforms because both accuracy and fexibility are essential. Complex non-linear models such as GBRT and deep learning are generally accurate, yet require labor-intensive model redesign after scenario changes (e.g., extra constraints due to new regulations). To accurately predict UOTD while remaining fexible to scenario changes, we propose Lin UOTD, a unifed linear regression model with more than 200 million dimensions of features. The simple model structure eliminates the need of repeated model redesign, while the high-dimensional features contribute to accurate UOTD prediction. We further design a series of optimization techniques for eficient model training and updating. Evaluations on two largescale datasets from an industrial online taxicab platform verify that Lin UOTD outperforms popular non-linear models in accuracy. We envision our experiences to adopt simple linear models with high-dimensional features in UOTD prediction as a pilot study and can shed insights upon other industrial large-scale spatio-temporal prediction problems.
AB - Taxi-calling apps are gaining increasing popularity for their eficiency in dispatching idle taxis to passengers in need. To precisely balance the supply and the demand of taxis, online taxicab platforms need to predict the Unit Original Taxi Demand (UOTD), which refers to the number of taxi-calling requirements submitted per unit time (e.g., every hour) and per unit region (e.g., each POI). Predicting UOTD is non-trivial for large-scale industrial online taxicab platforms because both accuracy and fexibility are essential. Complex non-linear models such as GBRT and deep learning are generally accurate, yet require labor-intensive model redesign after scenario changes (e.g., extra constraints due to new regulations). To accurately predict UOTD while remaining fexible to scenario changes, we propose Lin UOTD, a unifed linear regression model with more than 200 million dimensions of features. The simple model structure eliminates the need of repeated model redesign, while the high-dimensional features contribute to accurate UOTD prediction. We further design a series of optimization techniques for eficient model training and updating. Evaluations on two largescale datasets from an industrial online taxicab platform verify that Lin UOTD outperforms popular non-linear models in accuracy. We envision our experiences to adopt simple linear models with high-dimensional features in UOTD prediction as a pilot study and can shed insights upon other industrial large-scale spatio-temporal prediction problems.
KW - Feature engineering
KW - Prediction
KW - Unit original taxi demands
UR - https://www.scopus.com/pages/publications/85029095088
U2 - 10.1145/3097983.3098018
DO - 10.1145/3097983.3098018
M3 - 会议稿件
AN - SCOPUS:85029095088
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1653
EP - 1662
BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 13 August 2017 through 17 August 2017
ER -