TY - GEN
T1 - High level synthesis of complex applications
T2 - 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2016
AU - Liu, Xinheng
AU - Chen, Yao
AU - Nguyen, Tan
AU - Gurumani, Swathi
AU - Rupnow, Kyle
AU - Chen, Deming
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/2/21
Y1 - 2016/2/21
N2 - High level synthesis (HLS) is gaining wider acceptance for hardware design due to its higher productivity and better design space exploration features. In recent years, HLS techniques and design flows have also advanced significantly, and as a result, many new FPGA designs are developed with HLS. However, despite many studies using HLS, the size and complexity of such applications remain generally small, and it is not well understood how to design and optimize for HLS with large, complex reference code. Typical HLS benchmark applications contain somewhere between 100 to 1400 lines of code and about 20 sub-functions, but typical input applications may contain many times more code and functions. To study such complex applications, we present a case study using HLS for a full H.264 decoder: an application with over 6000 lines of code and over 100 functions. We share our experience on code conversion for synthesizability, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights. Through our optimization process, we achieve 34 frames/s at 640x480 resolution (480p). To enable future study and benefit the research community, we open-source our synthesizable H.264 implementation.
AB - High level synthesis (HLS) is gaining wider acceptance for hardware design due to its higher productivity and better design space exploration features. In recent years, HLS techniques and design flows have also advanced significantly, and as a result, many new FPGA designs are developed with HLS. However, despite many studies using HLS, the size and complexity of such applications remain generally small, and it is not well understood how to design and optimize for HLS with large, complex reference code. Typical HLS benchmark applications contain somewhere between 100 to 1400 lines of code and about 20 sub-functions, but typical input applications may contain many times more code and functions. To study such complex applications, we present a case study using HLS for a full H.264 decoder: an application with over 6000 lines of code and over 100 functions. We share our experience on code conversion for synthesizability, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights. Through our optimization process, we achieve 34 frames/s at 640x480 resolution (480p). To enable future study and benefit the research community, we open-source our synthesizable H.264 implementation.
UR - http://www.scopus.com/inward/record.url?scp=84966552393&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966552393&partnerID=8YFLogxK
U2 - 10.1145/2847263.2847274
DO - 10.1145/2847263.2847274
M3 - Conference contribution
AN - SCOPUS:84966552393
T3 - FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 224
EP - 233
BT - FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery
Y2 - 21 February 2016 through 23 February 2016
ER -