TY - GEN
T1 - A study of feature construction for text-based forecasting of time series variables
AU - Wang, Yiren
AU - Seyler, Dominic
AU - Santu, Shubhra Kanti Karmaker
AU - Zhai, Cheng Xiang
N1 - Publisher Copyright:
© 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Time series are ubiquitous in the world since they are used to measure various phenomena (e.g., temperature, spread of a virus, sales, etc.). Forecasting of time series is highly beneficial (and necessary) for optimizing decisions, yet is a very challenging problem; using only the historical values of the time series is often insufficient. In this paper, we study how to construct effective additional features based on related text data for time series forecasting. Besides the commonly used n-gram features, we propose a general strategy for constructing multiple topical features based on the topics discovered by a topic model. We evaluate feature effectiveness using a data set for predicting stock price changes where we constructed additional features from news text articles for stock market prediction. We found that: 1) Text-based features outperform time series-based features, suggesting the great promise of leveraging text data for improving time series forecasting. 2) Topic-based features are not very effective stand-alone, but they can further improve performance when added on top of n-gram features. 3) The best topic-based feature appears to be a long-term aggregation of topics over time with high weights on recent topics.
AB - Time series are ubiquitous in the world since they are used to measure various phenomena (e.g., temperature, spread of a virus, sales, etc.). Forecasting of time series is highly beneficial (and necessary) for optimizing decisions, yet is a very challenging problem; using only the historical values of the time series is often insufficient. In this paper, we study how to construct effective additional features based on related text data for time series forecasting. Besides the commonly used n-gram features, we propose a general strategy for constructing multiple topical features based on the topics discovered by a topic model. We evaluate feature effectiveness using a data set for predicting stock price changes where we constructed additional features from news text articles for stock market prediction. We found that: 1) Text-based features outperform time series-based features, suggesting the great promise of leveraging text data for improving time series forecasting. 2) Topic-based features are not very effective stand-alone, but they can further improve performance when added on top of n-gram features. 3) The best topic-based feature appears to be a long-term aggregation of topics over time with high weights on recent topics.
UR - http://www.scopus.com/inward/record.url?scp=85037334875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037334875&partnerID=8YFLogxK
U2 - 10.1145/3132847.3133109
DO - 10.1145/3132847.3133109
M3 - Conference contribution
AN - SCOPUS:85037334875
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2347
EP - 2350
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Y2 - 6 November 2017 through 10 November 2017
ER -