Interpretable Stock Price Forecasting Model Using Genetic Algorithm-machine Learning Regressions and Best Feature Subset Selection

Document Type

Peer-Reviewed Article

Publication Date



Recent stock market studies adopting machine learning and deep learning techniques have achieved remarkable performances with convenient accessibility. However, machine learning and deep learning models are notorious for their black-box structure. To build human-friendly interpretability in stock price prediction, many studies focus on the relationship between input features and the outcome by measuring the feature importance. However, the feature-importance-based interpretability methods have such drawbacks as relative feature importance, vague importance of correlated features, and impractical interpretability. Furthermore, they overlook two principal characteristics of time series stock price data: time-dependency and collective behavior of features. As a solution to catch the collective behavior of features over a whole data period, we propose the best feature subset selection. Additionally, for the solution to reflect the time-dependent characteristic of stock price data over a short data period, we propose piecewise best feature subset selection. The proposed algorithm uses two separate input feature sets: internal technical indicators and external market prices. This bilateral forecasting scheme goes through a two-stage feature selection process composed of feature set expansion, hybridized genetic algorithm-machine learning regressions to select important features, and importance score filtering to select optimal features. Finally, the best feature subset is selected for forecasting and interpretation. The proposed method achieves the best feature subset of parsimoniously fewer features for interpretability and improves average forecasting Root Mean Squared Error by 10.42% for the optimal feature set and 13.47% for the best feature subset of the internal technical indicators. For enhanced local interpretability in this study, we use Savitzky-Golay smoothing as part of piecewise optimal curve fitting to examine each potential grouping of external features. The proposed local interpretability technique using piecewise optimal curve fitting and piecewise best feature subset provides a more timely-flexible interpretation of stock price behavior using a few best features for piecewise data segments. Compared with other feature-importance interpretability techniques that only rely on either a single data point or a whole data period, the proposed interpretability technique overcomes their limitations, and reflects the main characteristics of time series data.


Available online 24 September 2022, 118803.

This article was researched and written when Max K. K. Yun was affiliated with Department of Systems Science and Industrial Engineering, State University of New York at Binghamton, Binghamton, New York.