DATA
ABOUT ME
Bryan Kelly is Frederick Frank ’54 and Mary C. Tanner Professor of Finance at the Yale School of Management and Head of Machine Learning at AQR Capital Management. He is also a Research Fellow at the National Bureau of Economic Research and Associate Director of SOM’s International Center for Finance. Professor Kelly’s primary research fields are asset pricing, machine learning, and financial econometrics. He has served as co-editor of the Journal of Financial Econometrics and associate editor of Journal of Finance and Journal of Financial Economics. Before joining Yale, Kelly was a tenured professor of finance at the University of Chicago Booth School of Business. He earned an AB in economics from University of Chicago, MA in economics from University of California San Diego, and a PhD and MPhil in finance from New York University’s Stern School of Business. Kelly worked in investment banking at Morgan Stanley prior to his PhD.
Go to jkpfactors.com to download factor portfolio return data for the 153 factors in 93 countries studied in "Is There A Replication Crisis In Finance?" by Jensen, Kelly, and Pedersen (2021) in The Journal of Finance.
You can also download our stock-level characteristics data through WRDS.
Our Github Code Repository provides code to produce all underlying stock-level signals. For researchers with a WRDS account, this SAS code runs on the WRDS server to produce 406 characteristics (including the 153 in our paper) and the associated factor portfolios in 93 countries.
Our Documentation (.pdf) describes data contents in detail and provides step-by-step explanation of each variable's construction.
To Request Additional Data, ask any questions about our data, or report any issues, please email me.
This website analyzes results and provides data based on "Business News and Business Cycles" by Bybee, Kelly, Manela, and Xiu (2023) in The Journal of Finance.
INTERMEDIARY ASSET PRICING
Intermediary capital risk factor, 1970Q1–2018Q3 based on "Intermediary Asset Pricing: New Evidence From Many Asset Classes" by He, Kelly, and Manela (2017) in The Journal of Financial Economics. Quarterly, monthly, and starting 2000-01-01 daily too. Also includes portfolio returns used in our cross-sectional tests. See readme.txt inside for details and replication code. Courtesy of Asaf Manela. Some of these series are updated more frequently by Zhiguo He and are available here.
Data and documentation for corporate bond risk factors estimated via IPCA as in "Modeling Corporate Bond Returns" by Kelly, Pruitt, and Palhares (2022) in The Journal of Finance.
PUBLISHED ARTICLES
Review of Financial Studies, Forthcoming (with T. Jensen, S. Malamud, and L. Pedersen)
Journal of Investment Management, 2024 (with L. Gomez and R. Israelov)
Financial Analysts Journal, 2024 (with G. De Nard and R. Engle)
Foundations and Trends in Finance, 2023 (with D. Xiu)
Journal of Finance, Forthcoming (with L. Bybee, A. Manela, and D. Xiu)
Journal of Finance, Forthcoming (with S. Giglio and S. Kozak)
Review of Financial Studies, Forthcoming (with L. Bybee and Y. Su)
Journal of Finance, Forthcoming (with S. Malamud and K. Zhou)
Journal of Finance, Forthcoming (J. Jiang and D. Xiu)
Annual Review of Financial Economics, In Process (with S. Giglio and D. Xiu)
Journal of Finance, Forthcoming (with D. Palhares and S. Pruitt)
Journal of Financial Economics, Forthcoming (with M. Buechner)
Journal of Finance, Forthcoming (with T. Jensen and L. Pedersen)
Journal of Finance, Forthcoming (with S. Malamud and L. Pedersen)
Journal of Business and Economic Statistics, Invited Paper (with A. Manela and A. Moreira)
Annual Review of Financial Economics, Forthcoming (with S. Giglio and J. Stroebel)
Journal of Financial Economics, Forthcoming (with I. Dew-Becker and S. Giglio)
Journal of Financial Economics, 2021 (with S. Pruitt and T. Moskowitz)
American Economic Review, Insights, Forthcoming (with D. Papanikolaou, A. Seru and M. Taddy)
Journal of Political Economy, 2021 (with B. Herskovic, H. Lustig and S. Van Nieuwerburgh)
Journal of Investment Management, 2020 (with R. Israel and T. Moskowitz)
Journal of Financial Economics, 2020 (with Y. Chen and W. Wu)
Journal of Portfolio Management, 2019 (with T. Gupta)
Journal of Econometrics, 2021 (with S. Gu and D. Xiu)
Review of Financial Studies, 2020 (with S. Gu and D. Xiu)
Review of Financial Studies, 2020 (with R. Engle, S. Giglio, H. Lee and J. Stroebel)
Journal of Financial Economics, 2019 (with S. Pruitt and Y. Su)
Corrigendum for Table 7
Journal of Economic Literature, 2019 (with M. Gentzkow and M. Taddy)
Quarterly Journal of Economics, 2018 (with S. Giglio)
Journal of Financial Economics, 2017 (with Z. He and A. Manela)
American Economic Review, 2016 (with H. Lustig and S. Van Nieuwerburgh)
Journal of Finance, 2016 (with L. Pastor and P. Veronesi)
Journal of Financial Economics, 2016 (with B. Herskovic, H. Lustig and S. Van Nieuwerburgh)
Journal of Financial Economics, 2016 (with S. Giglio and S. Pruitt)
Journal of Econometrics, 2015 (with S. Pruitt)
Review of Financial Studies, 2014 (with H. Jiang)
Extremes, 2014
Journal of Finance, 2014 (with K. Balakrishnan, M. Billings and A. Ljungqvist)
Journal of Finance, 2013 (with S. Pruitt)
Review of Financial Studies, 2012 (with A. Ljungqvist)
Journal of Business and Economic Statistics, 2012 (with R. Engle)
Journal of Risk, 2011 (with C. Brownless and R. Engle)
WORKING PAPERS
(with S. Malamud, M. Pourmohammadi, and F. Trojani)
We introduce a novel shrinkage methodology for building optimal portfolios in environments of high complexity where the number of assets is comparable to or larger than the number of observations. Our universal portfolio shrinkage approximator (UPSA) is derived in closed form, is easy to implement, and dominates other existing shrinkage methods. It exhibits an explicit two-fund separation, optimally combining Markowitz with a complexity correction. Instead of annihilating the low-variance principal components, UPSA weights them efficiently. Contrary to conventional wisdom, low in-sample variance principal components (PCs) are key to out-of-sample model performance. By optimally incorporating them into portfolio construction, UPSA produces a stochastic discount factor that significantly dominates its PC-sparse counterparts. Thus, PC-sparsity is just an artifact of inefficient shrinkage.
(with A. Didisheim, S. Ke, and S. Malamud)
We introduce artificial intelligence pricing theory (AIPT). In contrast with the APT’s foundational assumption of a low dimensional factor structure in returns, the AIPT conjectures that returns are driven by a large number of factors. We first verify this conjecture empirically and show that nonlinear models with an exorbitant number of factors (many more than the number of training observations or base assets) are far more successful in describing the out-of-sample behavior of asset returns than simpler standard models. We then theoretically characterize the behavior of large factor pricing models, from which we show that the AIPT’s “many factors” conjecture faithfully explains our empirical findings, while the APT’s “few factors” conjecture is contradicted by the data.
(with D. Xiu)
We extract contextualized representations of news text to predict returns using the state-of-the-art large language models in natural language processing. Unlike the traditional word-based methods, e.g., bag-of-words or word vectors, the contextualized representation captures both the syntax and semantics of text, thus providing a more comprehensive understanding of its meaning. Notably, word-based approaches are more susceptible to errors when negation words are present in news articles. Our study includes data from 16 international equity markets and news articles in 13 different languages, providing polyglot evidence of news-induced return predictability. We observe that information in newswires is incorporated into prices with an inefficient delay that aligns with the limits-to-arbitrage, yet can still be exploited in real-time trading strategies. Additionally, we find that a trading strategy that capitalizes on fresh news alerts results in even higher Sharpe ratios.
(with T. Bali, M. Moerke, and J. Rahman)
We propose a statistical model of differences in beliefs in which heterogeneous investors are represented as different machine learning model specifications. Each investor forms return forecasts from their own specific model using data inputs that are available to all investors. We measure disagreement as dispersion in forecasts across investor-models. Our measure aligns with extant measures of disagreement (e.g., analyst forecast dispersion), but is a significantly stronger predictor of future returns. We document a large, significant, and highly robust negative cross-sectional relation between belief disagreement and future returns. A decile spread portfolio that is short stocks with high forecast disagreement and long stocks with low disagreement earns a value-weighted alpha of 15% per year. A range of analyses suggest the alpha is mispricing induced by short-sale costs and limits-to-arbitrage.
(with S. Malamud and T.A. Xiu)
The recent discovery of the equivalence between infinitely wide neural networks in the lazy training regime and Neural Tangent Kernels has revived interest in kernel methods. However, conventional wisdom suggests kernel methods are unsuitable for large samples due to their computational complexity and memory requirements. We introduce a novel random feature regression algorithm that allows us (when necessary) to scale to virtually infinite numbers of random features. We illustrate the performance of our method on the CIFAR-10 dataset.
(with S. Malamud and K. Zhou)
We document the "virtue of complexity" in all asset classes that we study (US equities, international equities, bonds, commodities, currencies, and interest rates). Return prediction R-squared and optimal portfolio Sharpe ratio generally increase with model parameterization for every asset class. The virtue of complexity is present even in extremely data-scarce environments, e.g., for predictive models with less than twenty observations and tens of thousands of predictors. The empirical association between model complexity and out-of-sample model performance exhibits a striking consistency with theoretical predictions.
(with B. Kuznetsov, S. Malamud, and T.A. Xu)
We develop a novel methodology for extracting information from option implied volatility (IV) surfaces for the cross-section of stock returns, using image recognition techniques from machine learning (ML). The predictive information we identify is essentially uncorrelated with most of the existing option-implied characteristics, delivers a higher Sharpe ratio, and has a significant alpha relative to a battery of standard and option-implied factors. We introduce principal linear features, an analog of principal components for ML and use them to show IV feature complexity: A low-rank rotation of the IV surface cannot explain the model performance.
(with A. Didisheim and S. Malamud)
We introduce a methodology for designing and training deep neural networks (DNN) that we call "Deep Regression Ensembles" (DRE). It bridges the gap between DNN and two-layer neural networks trained with random feature regression. Each layer of DRE has two components, randomly drawn input weights and output weights trained myopically (as if the final output layer) using linear ridge regression. Within a layer, each neuron uses a different subset of inputs and a different ridge penalty, constituting an ensemble of random feature ridge regressions. Our experiments show that a single DRE architecture is at par with or exceeds state-of-the-art DNN in many data sets. Yet, because DRE neural weights are either known in closed-form or randomly drawn, its computational cost is orders of magnitude smaller than DNN.
(with Z. Ke and D. Xiu)
We introduce a new text-mining methodology that extracts sentiment information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a sentiment score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of sentiment terms via predictive screening, 2) assigning sentiment weights to these words via topic modeling, and 3) aggregating terms into an article-level sentiment score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we text-mine one of the most actively monitored streams of news articles in the financial system—the Dow Jones Newswires—and show that our supervised sentiment model excels at extracting return-predictive signals in this context.
(with S. Pruitt and Y. Su)
Econometric development of the IPCA method used in ''Characteristics Are Covariances: A Unified Model of Risk and Return ''
(with G. Manzo and D. Palhares)
We define and construct a credit-implied volatility (CIV) surface from the firm-by-maturity panel of CDS spreads. We use this framework to organize the behavior of corporate credit markets into three stylized facts. First, CIV exhibits a steep moneyness smirk. Second, the joint dynamics of credit spreads on all firms are captured by three interpretable factors in the CIV surface. Third, the cross section of CDS risk premia is fully explained by exposures to CIV surface shocks. We propose a structural model for joint asset behavior of all firms that is characterized by stochastic volatility and time-varying downside tail risk in aggregate asset growth.
CONTACT
Bryan Kelly
Yale School of Management
165 Whitney Ave.
New Haven, CT 06511
203-432-2221