TY - GEN
T1 - A novel SoC architecture on FPGA for ultra fast face detection
AU - He, Chun
AU - Papakonstantinou, Alexandras
AU - Chen, Deming
PY - 2009
Y1 - 2009
N2 - Face detection is the cornerstone of a wide range of applications such as video surveillance, robotic vision and biométrie authentication. One of the biggest challenges in face detection based applications is the speed at which faces can be accurately detected. In this paper, we present a novel SoC (System on Chip) architecture for ultra fast face detection in video or other image rich content. Our implementation is based on an efficient and robust algorithm that uses a cascade of Artificial Neural Network (ANN) classifiers on AdaBoost trained Haar features. The face detector architecture extracts the coarse grained parallelism by efficiently overlapping different computation phases while taking advantage of the finegrained parallelism at the module level. We provide details on the parallelism extraction achieved by our architecture and show experimental results that portray the efficiency of our face detection implementation. For the implementation and evaluation of our architecture we used the Xilinx FX130T Virtex5 FPGA device on the ML510 development board. Our performance evaluations indicate that a speedup of around 100X can be achieved over a SSE-optimized software implementation running on a 2.4GHz Core-2 Quad CPU. The detection speed reaches 625 frames per sec (fps).
AB - Face detection is the cornerstone of a wide range of applications such as video surveillance, robotic vision and biométrie authentication. One of the biggest challenges in face detection based applications is the speed at which faces can be accurately detected. In this paper, we present a novel SoC (System on Chip) architecture for ultra fast face detection in video or other image rich content. Our implementation is based on an efficient and robust algorithm that uses a cascade of Artificial Neural Network (ANN) classifiers on AdaBoost trained Haar features. The face detector architecture extracts the coarse grained parallelism by efficiently overlapping different computation phases while taking advantage of the finegrained parallelism at the module level. We provide details on the parallelism extraction achieved by our architecture and show experimental results that portray the efficiency of our face detection implementation. For the implementation and evaluation of our architecture we used the Xilinx FX130T Virtex5 FPGA device on the ML510 development board. Our performance evaluations indicate that a speedup of around 100X can be achieved over a SSE-optimized software implementation running on a 2.4GHz Core-2 Quad CPU. The detection speed reaches 625 frames per sec (fps).
UR - http://www.scopus.com/inward/record.url?scp=77950963554&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77950963554&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2009.5413122
DO - 10.1109/ICCD.2009.5413122
M3 - Conference contribution
AN - SCOPUS:77950963554
SN - 9781424450282
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 412
EP - 418
BT - 2009 IEEE International Conference on Computer Design, ICCD 2009
T2 - 2009 IEEE International Conference on Computer Design, ICCD 2009
Y2 - 4 October 2009 through 7 October 2009
ER -