TY - JOUR
T1 - Detection and Analysis of Spikes in a Random Sequence
AU - Dasgupta, Anirban
AU - Li, Bo
N1 - Acknowledgements We are greatly indebted to Joe Glaz and Christian Robert for carefully reading earlier drafts of this manuscript and for contributing to the development of the results. Comments from two anonymous reviewers very greatly improved this paper and we are much indebted to the reviewers. We acknowledge that Li’s research is partially supported by NSF grants DPP-1418339 and AGS-1602845 and NASA-NNX14A080G, and DasGupta’s research is partially supported by grant 206057 from Elsevier Global Analytics.
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Motivated by the more frequent natural and anthropogenic hazards, we revisit the problem of assessing whether an apparent temporal clustering in a sequence of randomly occurring events is a genuine surprise and should call for an examination. We study the problem in both discrete and continuous time formulation. In the discrete formulation, the problem reduces to deriving the probability that p independent people all have birthdays within d days of each other. We provide an analytical expression for a warning limit such that if a subset of p people among n are observed to have birthdays within d days of each other and d is smaller than our warning limit, then it should be treated as a surprising cluster. In the continuous time framework, three different sets of results are given. First, we provide an asymptotic analysis of the problem by embedding it into an extreme value problem for high order spacings of iid samples from the U[0, 1] density. Second, a novel analytical nonasymptotic bound is derived by using certain tools of empirical process theory. Finally, the required probability is approximated by using various bounds and asymptotic results on the supremum of the scanning process of a one dimensional stationary Poisson process. We apply the theories to climate change related datasets, datasets on temperatures, and mass shooting records in the United States. These real data applications of our theoretical methods lead to supporting evidence for climate change and recent spikes in gun violence.
AB - Motivated by the more frequent natural and anthropogenic hazards, we revisit the problem of assessing whether an apparent temporal clustering in a sequence of randomly occurring events is a genuine surprise and should call for an examination. We study the problem in both discrete and continuous time formulation. In the discrete formulation, the problem reduces to deriving the probability that p independent people all have birthdays within d days of each other. We provide an analytical expression for a warning limit such that if a subset of p people among n are observed to have birthdays within d days of each other and d is smaller than our warning limit, then it should be treated as a surprising cluster. In the continuous time framework, three different sets of results are given. First, we provide an asymptotic analysis of the problem by embedding it into an extreme value problem for high order spacings of iid samples from the U[0, 1] density. Second, a novel analytical nonasymptotic bound is derived by using certain tools of empirical process theory. Finally, the required probability is approximated by using various bounds and asymptotic results on the supremum of the scanning process of a one dimensional stationary Poisson process. We apply the theories to climate change related datasets, datasets on temperatures, and mass shooting records in the United States. These real data applications of our theoretical methods lead to supporting evidence for climate change and recent spikes in gun violence.
KW - Poisson process
KW - Probability
KW - Random sequence
KW - Scan statistic
UR - http://www.scopus.com/inward/record.url?scp=85045926425&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045926425&partnerID=8YFLogxK
U2 - 10.1007/s11009-018-9637-0
DO - 10.1007/s11009-018-9637-0
M3 - Article
AN - SCOPUS:85045926425
SN - 1387-5841
VL - 20
SP - 1429
EP - 1451
JO - Methodology and Computing in Applied Probability
JF - Methodology and Computing in Applied Probability
IS - 4
ER -