UnitBox: An advanced object detection network

Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, Thomas Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In present object detection systems, the deep convolutional neural networks (CNNs) are utilized to predict bounding boxes of object candidates, and have gained performance advantages over the traditional region proposal methods. However, existing deep CNN methods assume the object bounds to be four independent variables, which could be regressed by the ℓ2 loss separately. Such an oversimplified assumption is contrary to the well-received observation, that those variables are correlated, resulting to less accurate localization. To address the issue, we firstly introduce a novel Intersection over Union (IoU) loss function for bounding box prediction, which regresses the four bounds of a predicted box as a whole unit. By taking the advantages of IoU loss and deep fully convolutional networks, the UnitBox is introduced, which performs accurate and efficient localization, shows robust to objects of varied shapes and scales, and converges fast. We apply UnitBox on face detection task and achieve the best performance among all published methods on the FDDB benchmark.

Original languageEnglish (US)
Title of host publicationMM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Number of pages5
ISBN (Electronic)9781450336031
StatePublished - Oct 1 2016
Event24th ACM Multimedia Conference, MM 2016 - Amsterdam, United Kingdom
Duration: Oct 15 2016Oct 19 2016

Publication series

NameMM 2016 - Proceedings of the 2016 ACM Multimedia Conference


Other24th ACM Multimedia Conference, MM 2016
Country/TerritoryUnited Kingdom


  • Bounding Box Prediction
  • IoU Loss
  • Object Detection

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Software


Dive into the research topics of 'UnitBox: An advanced object detection network'. Together they form a unique fingerprint.

Cite this