TY - GEN
T1 - Sparse LU factorization for parallel circuit simulation on GPU
AU - Ren, Ling
AU - Chen, Xiaoming
AU - Wang, Yu
AU - Zhang, Chenxi
AU - Yang, Huazhong
PY - 2012/7/11
Y1 - 2012/7/11
N2 - Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency. The strong data-dependency determines that parallel sparse LU factorization runs efficiently on shared-memory computing devices. But the number of CPU cores sharing the same memory is often limited. The state of the art Graphic Processing Units (GPU) naturally have numerous cores sharing the device memory, and provide a possible solution to the problem. In this paper, we propose a GPU-based sparse LU solver for circuit simulation. We optimize the work partitioning, the number of active thread groups, and the memory access pattern, based on GPU architecture. On matrices whose factorization involves many floating-point operations, our GPU-based sparse LU factorization achieves 7.90x speedup over 1-core CPU and 1.49x speedup over 8-core CPU. We also analyze the scalability of parallel sparse LU factorization and investigate the specifications on CPUs and GPUs that most influence the performance.
AB - Sparse solver has become the bottleneck of SPICE simulators. There has been few work on GPU-based sparse solver because of the high data-dependency. The strong data-dependency determines that parallel sparse LU factorization runs efficiently on shared-memory computing devices. But the number of CPU cores sharing the same memory is often limited. The state of the art Graphic Processing Units (GPU) naturally have numerous cores sharing the device memory, and provide a possible solution to the problem. In this paper, we propose a GPU-based sparse LU solver for circuit simulation. We optimize the work partitioning, the number of active thread groups, and the memory access pattern, based on GPU architecture. On matrices whose factorization involves many floating-point operations, our GPU-based sparse LU factorization achieves 7.90x speedup over 1-core CPU and 1.49x speedup over 8-core CPU. We also analyze the scalability of parallel sparse LU factorization and investigate the specifications on CPUs and GPUs that most influence the performance.
KW - circuit simulation
KW - GPU
KW - parallel sparse LU factorization
UR - http://www.scopus.com/inward/record.url?scp=84863544666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863544666&partnerID=8YFLogxK
U2 - 10.1145/2228360.2228565
DO - 10.1145/2228360.2228565
M3 - Conference contribution
AN - SCOPUS:84863544666
SN - 9781450311991
T3 - Proceedings - Design Automation Conference
SP - 1125
EP - 1130
BT - Proceedings of the 49th Annual Design Automation Conference, DAC '12
T2 - 49th Annual Design Automation Conference, DAC '12
Y2 - 3 June 2012 through 7 June 2012
ER -