Interpretable Stock Price Forecasting Model Using Genetic Algorithm-machine Learning Regressions and Best Feature Subset Selection

Kyung Keun Yun, Sacred Heart UniversityFollow
Sang Wong Yoon, Waseda University
Daehan Won, Binghamton University--SUNY

Document Type

Peer-Reviewed Article

Publication Date

3-2023

Abstract

Recent stock market studies adopting machine learning and deep learning techniques have achieved remarkable performances with convenient accessibility. However, machine learning and deep learning models are notorious for their black-box structure. To build human-friendly interpretability in stock price prediction, many studies focus on the relationship between input features and the outcome by measuring the feature importance. However, the feature-importance-based interpretability methods have such drawbacks as relative feature importance, vague importance of correlated features, and impractical interpretability. Furthermore, they overlook two principal characteristics of time series stock price data: time-dependency and collective behavior of features. As a solution to catch the collective behavior of features over a whole data period, we propose the best feature subset selection. Additionally, for the solution to reflect the time-dependent characteristic of stock price data over a short data period, we propose piecewise best feature subset selection. The proposed algorithm uses two separate input feature sets: internal technical indicators and external market prices. This bilateral forecasting scheme goes through a two-stage feature selection process composed of feature set expansion, hybridized genetic algorithm-machine learning regressions to select important features, and importance score filtering to select optimal features. Finally, the best feature subset is selected for forecasting and interpretation. The proposed method achieves the best feature subset of parsimoniously fewer features for interpretability and improves average forecasting Root Mean Squared Error by 10.42% for the optimal feature set and 13.47% for the best feature subset of the internal technical indicators. For enhanced local interpretability in this study, we use Savitzky-Golay smoothing as part of piecewise optimal curve fitting to examine each potential grouping of external features. The proposed local interpretability technique using piecewise optimal curve fitting and piecewise best feature subset provides a more timely-flexible interpretation of stock price behavior using a few best features for piecewise data segments. Compared with other feature-importance interpretability techniques that only rely on either a single data point or a whole data period, the proposed interpretability technique overcomes their limitations, and reflects the main characteristics of time series data.

Comments

Available online 24 September 2022, 118803.

This article was researched and written when Max K. K. Yun was affiliated with Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, New York.

DOI

10.1016/j.eswa.2022.118803

Recommended Citation

Yun, K. K., Yoon, S. W., & Won, D. (2023). Interpretable stock price forecasting model using genetic algorithm-machine learning regressions and best feature subset selection. Expert Systems with Applications, 213 Pt. A,118803. Doi: 10.1016/j.eswa.2022.118803

Link to Full Text

COinS

Interpretable Stock Price Forecasting Model Using Genetic Algorithm-machine Learning Regressions and Best Feature Subset Selection

Document Type

Publication Date

Abstract

Comments

DOI

Recommended Citation

Search

Browse

Author Corner

Links

SelectedWorks Author Gallery

WCBT Faculty Publications

Interpretable Stock Price Forecasting Model Using Genetic Algorithm-machine Learning Regressions and Best Feature Subset Selection

Authors

Document Type

Publication Date

Abstract

Comments

DOI

Recommended Citation

Share

Search

Browse

Author Corner

Links

SelectedWorks Author Gallery