No-frills human-object interaction detection: Factorization, layout encodings, and training techniques

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9676-9684
Number of pages9
ISBN (Electronic)9781728148038
DOIs
StatePublished - Oct 2019
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: Oct 27 2019Nov 2 2019

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2019-October
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
CountryKorea, Republic of
CitySeoul
Period10/27/1911/2/19

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'No-frills human-object interaction detection: Factorization, layout encodings, and training techniques'. Together they form a unique fingerprint.

Cite this