TY - JOUR
T1 - Worst-case complexity of cyclic coordinate descent
T2 - O(n2) gap with randomized version
AU - Sun, Ruoyu
AU - Ye, Yinyu
N1 - Publisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/1
Y1 - 2021/1
N2 - This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method, Kaczmarz method and projection onto convex sets (POCS) in this simple setting. We observe that the known provable complexity of C-CD can be O(n2) times slower than randomized coordinate descent (R-CD), but no example was proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least O(n4κCDlog1ϵ) operations, where κCD is related to Demmel’s condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be O(n2) times slower than R-CD, which has complexity O(n2κCDlog1ϵ). Note that for this example, the gap exists for any fixed update order, not just a particular order. An immediate consequence is that for Gauss–Seidel method, Kaczmarz method and POCS, there is also an O(n2) gap between the cyclic versions and randomized versions (for solving linear systems). One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a lower bound for the convergence rate. Finally, we design some numerical experiments to show that the size of the off-diagonal entries is an important indicator of the practical performance of C-CD.
AB - This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss–Seidel method, Kaczmarz method and projection onto convex sets (POCS) in this simple setting. We observe that the known provable complexity of C-CD can be O(n2) times slower than randomized coordinate descent (R-CD), but no example was proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least O(n4κCDlog1ϵ) operations, where κCD is related to Demmel’s condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be O(n2) times slower than R-CD, which has complexity O(n2κCDlog1ϵ). Note that for this example, the gap exists for any fixed update order, not just a particular order. An immediate consequence is that for Gauss–Seidel method, Kaczmarz method and POCS, there is also an O(n2) gap between the cyclic versions and randomized versions (for solving linear systems). One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a lower bound for the convergence rate. Finally, we design some numerical experiments to show that the size of the off-diagonal entries is an important indicator of the practical performance of C-CD.
UR - http://www.scopus.com/inward/record.url?scp=85074473223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074473223&partnerID=8YFLogxK
U2 - 10.1007/s10107-019-01437-5
DO - 10.1007/s10107-019-01437-5
M3 - Article
AN - SCOPUS:85074473223
SN - 0025-5610
VL - 185
SP - 487
EP - 520
JO - Mathematical Programming
JF - Mathematical Programming
IS - 1-2
ER -