How to predict the macro-financial state of an economy using big data? We suggest a novel and practical approach. We exploit wide range of global news sources in order to extract multi-class global sentiments and build an early warning system based on ...
How to predict the macro-financial state of an economy using big data? We suggest a novel and practical approach. We exploit wide range of global news sources in order to extract multi-class global sentiments and build an early warning system based on them. The technology for extracting and processing sentiments from the big data source is NLP (Natural Language Processing). Then, we match the variation of the sentiments with future variation of macro-financial variables.
When we assign a word into sentiment categories, we use the Harvard dictionary. Then, we define multi-class sentiments such as net positive tone (positive minus negative tones divided by total word count), the number of news, positivity, negativity, strength, weakness, activeness, and passiveness. We also analyze other sentiments in Harvard dictionary, but exclude them to make this paper concise. A word in an article can be classified into one or multiple sentiments if the word implies multiple sentiments. We also apply Loughran and McDonald Sentiment Word Lists (Loughran & McDonald 2011) to construct semantic networks and compare them with those from the Harvard dictionary.
We collect and analyze big data composed of 3,562 global news sources such as Financial Times, Wall Street Journal, and etc. For robustness, we use simplest NLP techniques (e.g. only excluding stop words). Our NLP analysis also covers the context of words. Note that a word can have different meanings depending on contexts and that the network of words can capture the context. Therefore we highlight semantic network. In order to measure the dynamically changing network of words, we construct the centrality index for each sentiment and semantic network. This improves the explanatory power of the sentiments in explaining and predicting the Korean macro-financial variables. The macro-financial variables we analyze are Financial Stability Index of Bank of Korea (FSI), VKOSPI (Volatility index about KOSPI 200), credit spread, term spread, EWY (iShare Korean ETF) trading volume, consumer sentiment index, foreign investor turnover in the Korean stock market, foreign investors’ net buying in the Korean stock market and foreign exchange rates (USD, JPY, RMB, EUR).
Our results are remarkable. First, even the simple counting of sentiment-related words can forecast how the macro-financial variables in Korea change in the future. The predictability is statistically significant for both short-term and long-term. Suppose we sort the days in our sample from most negative-emotion days to the most positive-emotion days. When we move from top 33% positive-emotion days to top 33% negative-emotion days, the following events happen next day on average: VKOSPI increases by 5.5, credit spread increases by 0.2%, term spread increases by 0.03%, log EWY trading volume decreases by 1.6, USD increases by 70, JPY decreases by 28, RMB decreases by 8, and EUR decreases by 48. And, the following events one occurs one month later on average: FSI increases by 6, VKOSPI increases by 8, credit spread increases by 0.35%, term spread increases by 0.1%, consumer confidence index decreases by 5.5, log EWY trading volume decreases by 2.4, foreigner net buying increases by 0.5%, USD increases by 106, and EUR decreases by 64. Our results become less significant once we include lagged variables and other controls, but many results still remain significant.
Second, it increase predictability to use sophisticated measures that take into account the overall structure of the semantic network, i.e. the contexts of words. We demonstrate a sophisticated measure with the centrality of sentiment-related words. Suppose we sort the months in our sample from most negative centrality to the most positive centrality. When we move from top 33% positive-centrality months to top 33% negative-centrality months, the following events happen after six months on average: FSI increases by 3.14, VKOSPI increases by 4.83, credit spread increases by 0.27%, term spread increases by 0.22%, log EWY trading volume decreases by 1.90, foreign turnover decreases by 0.02, foreigner net buying increases by 0.005, USD increases by 58, RMB decreases by 18. And, the following events occur one month later. FSI increases by 7.35, VKOSPI increases by 9.19, credit spread increases by 0.39%, term spread increases by 0.19%, consumer confidence index decreases by 5.73, log EWY trading volume decreases by 1.96, foreign turnover decreases by 0.01, USD increases by 89, RMB decreases by 14.
Third, it is very useful to visualize the semantic network about sentiments. Our semantic network changes dynamically in accordance with future financial crises and events. For example, negative, uncertain and litigious words are at the center of the semantic network on March 28, 1997. This foreshadows a dangerous and uncertain future of the Korean economy. Actually, the 1997 Asian financial crisis began three months after, and brought about a major downturn not only in the Korean economy but also in the global economy. On the other hand, on the semantic network at June 25, 2011, the positive words are centered, possibly promising stable Korean economy in future. Indeed, the Korean economy has shown steady economic growth since 2011. Therefore, depending on how to capture dynamic variation in the network, we can design highly intuitive and quantifiable indicators for future macro-financial uncertainties.
Our results suggest an idea to develop an early warning system for macro-financial conditions in an economy. Multi-class sentiments and their contexts extracted on NLP are informative and useful to design an early warning system about future uncertainties whether quantifiable or not the uncertainties are. This will be of great help not only for policy makers but also for retail investors and practitioners who lack risk management capabilities and tools to interpret qualitative uncertainties into risk for hedging and investing.