This study uses a machine learning technique to assess whether the thematic content of financial statement disclosures (labeled as topic) is incrementally informative in predicting intentional misreporting. Using a Bayesian topic modeling algorithm, we determine and empirically quantify the topic content of a large collection of 10-K narratives spanning the 1994 to 2012 period. We find that the algorithm produces a valid set of semantically meaningful topics that are predictive of financial misreporting based on samples of SEC enforcement actions (AAERs) and irregularity restatements arising from intentional GAAP violations. Our out-of-sample tests indicate that topic significantly improves the detection of financial misreporting when added to models based on commonly-used financial and textual style variables. Furthermore, we find that models including topic outperform traditional models when predicting long-duration misstatements. These results are robust to alternative topic definitions and regression specifications and various controls for firms with repeated instances of financial misreporting.
|Original language||English (US)|
|Number of pages||76|
|State||Published - Jul 5 2016|
|Name||27th Annual Conference on Financial Economics and Accounting Paper|
- Latent Dirichlet Allocation
- Financial Misreporting
Brown, N. C., Crowley, R. M., & Elliott, W. B. (2016). What are You Saying? Using Topic to Detect Financial Misreporting. (27th Annual Conference on Financial Economics and Accounting Paper). https://doi.org/10.2139/ssrn.2803733