TY - GEN
T1 - Solving social media text classification problems using code fragment-based XCSR
AU - Arif, Muhammad Hassan
AU - Li, Jianxin
AU - Iqbal, Muhammad
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Sentiment analysis and spam detection of social media text messages are two challenging data analysis tasks due to sparse and high-dimensional feature vectors. Learning classifier systems (LCS) are rule-based evolutionary computing systems and have limited capabilities to handle real valued sparse high-dimensional big data sets. LCS techniques use interval based representations to handle real valued feature vectors. In the work presented here, interval based representation is replaced by genetic programming based tree like structures to classify high-dimensional real valued text feature vectors. Multiple experiments are conducted on different social media text data sets, i.e.Tweets, movie reviews, amazon and yelp reviews, SMS and Email spam message to evaluate the proposed scheme. Real valued feature vectors are generated from these data sets using term frequency inverse document frequency and/or sentiment lexicons-based features. Results depicts the supremacy of the new encoding scheme over interval based representations in both small and large social media text data sets.
AB - Sentiment analysis and spam detection of social media text messages are two challenging data analysis tasks due to sparse and high-dimensional feature vectors. Learning classifier systems (LCS) are rule-based evolutionary computing systems and have limited capabilities to handle real valued sparse high-dimensional big data sets. LCS techniques use interval based representations to handle real valued feature vectors. In the work presented here, interval based representation is replaced by genetic programming based tree like structures to classify high-dimensional real valued text feature vectors. Multiple experiments are conducted on different social media text data sets, i.e.Tweets, movie reviews, amazon and yelp reviews, SMS and Email spam message to evaluate the proposed scheme. Real valued feature vectors are generated from these data sets using term frequency inverse document frequency and/or sentiment lexicons-based features. Results depicts the supremacy of the new encoding scheme over interval based representations in both small and large social media text data sets.
KW - Learning Classifier Systems
KW - Sentiment Analysis
KW - Spam Detection
KW - Text Classification
UR - https://www.scopus.com/pages/publications/85048476348
U2 - 10.1109/ICTAI.2017.00080
DO - 10.1109/ICTAI.2017.00080
M3 - 会议稿件
AN - SCOPUS:85048476348
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 485
EP - 492
BT - Proceedings - 2017 International Conference on Tools with Artificial Intelligence, ICTAI 2017
PB - IEEE Computer Society
T2 - 29th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2017
Y2 - 6 November 2017 through 8 November 2017
ER -