(with S. Malamud and L. Pedersen)
We propose a new asset-pricing framework in which all securities’ signals are used to predict each individual return. While the literature focuses on each security’s own-signal predictability, assuming an equal strength across securities, our framework is flexible and includes cross-predictability—leading to three main results. First, we derive the optimal strategy in closed form. It consists of eigenvectors of a “prediction matrix,” which we call “principal portfolios.” Second, we decompose the problem into alpha and beta, yielding optimal strategies with, respectively, zero and positive factor exposure. Third, we provide a new test of asset pricing models. Empirically, principal portfolios deliver significant out-of-sample alphas to standard factors in several data sets.
(with L. Bybee, A. Manela, and D. Xiu)
We propose an approach to measuring the state of the economy via textual analysis of business news. From the full text content of 800,000 Wall Street Journal articles for 1984–2017, we estimate a topic model that summarizes business news as easily interpretable topical themes and quantifies the proportion of news attention allocated to each theme at each point in time. We then use our news attention estimates as inputs into statistical models of numerical economic time series. We demonstrate that these text-based inputs accurately track a wide range of economic activity measures and that they have incremental forecasting power for macroeconomic outcomes, above and beyond standard numerical predictors. Finally, we use our model to retrieve the news-based narratives that underly “shocks” in numerical economic data.
(with Z. Ke and D. Xiu)
We introduce a new text-mining methodology that extracts sentiment information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a sentiment score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of sentiment terms via predictive screening, 2) assigning sentiment weights to these words via topic modeling, and 3) aggregating terms into an article-level sentiment score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we text-mine one of the most actively monitored streams of news articles in the financial system—the Dow Jones Newswires—and show that our supervised sentiment model excels at extracting return-predictive signals in this context.
(with R. Israelov)
Uncertainty about the future option return has two sources: Changes in the position and shape of the implied volatility surface that shift option values (holding moneyness and maturity fixed), and changes in the underlying price which alter an option's location on the surface and thus its value (holding the surface fixed). We estimate a joint time series model of the spot price and volatility surface and use this to construct an ex ante characterization of the option return distribution via bootstrap. Our ''ORB'' (option return bootstrap) model accurately forecasts means, variances, and extreme quantiles of S&P 500 index conditional option return distributions across a wide range of strikes and maturities.
(with I. Dew-Becker and S. Giglio)
This paper studies the pricing of shocks to implied and realized volatility using options in 19 different markets, covering financials, metals, energies, and agricultural products. The markets are directly related to the state of the macroeconomy and financial markets, and investors can use the options to separately hedge shocks to real uncertainty and to the realization of volatility. Historically, realized volatility has earned a robustly negative risk premium, indicating that high macroeconomic volatility is associated with high marginal utility.
(with A. Manela and A. Moreira)
Text data is inherently ultra-high dimensional, which makes machine learning techniques indispensable for textual analysis. Text also tends to be a highly selected outcome—journalists, speechwriters, and others carefully craft messages to target the limited attention of their audi- ences. We develop an economically motivated high dimensional selection model that improves machine learning from text (and from sparse counts data more generally). Our model is especially useful in cases where the cover/no-cover choice is separate or more interesting than the coverage quantity choice.