Consistent multilabel classification

Oluwasanmi Koyejo, Nagarajan Natarajan, Pradeep Ravikumar, Inderjit S. Dhillon

Research output: Contribution to journalConference article

Abstract

Multilabel classification is rapidly developing as an important aspect of modern predictive modeling, motivating study of its theoretical aspects. To this end, we propose a framework for constructing and analyzing multilabel classification metrics which reveals novel results on a parametric form for population optimal classifiers, and additional insight into the role of label correlations. In particular, we show that for multilabel metrics constructed as instance-, micro- and macroaverages, the population optimal classifier can be decomposed into binary classifiers based on the marginal instance-conditional distribution of each label, with a weak association between labels via the threshold. Thus, our analysis extends the state of the art from a few known multilabel classification metrics such as Hamming loss, to a general framework applicable to many of the classification metrics in common use. Based on the population-optimal classifier, we propose a computationally efficient and general-purpose plug-in classification algorithm, and prove its consistency with respect to the metric of interest. Empirical results on synthetic and benchmark datasets are supportive of our theoretical findings.

Original languageEnglish (US)
Pages (from-to)3321-3329
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2015-January
StatePublished - Jan 1 2015
Externally publishedYes
Event29th Annual Conference on Neural Information Processing Systems, NIPS 2015 - Montreal, Canada
Duration: Dec 7 2015Dec 12 2015

Fingerprint

Classifiers
Labels

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Koyejo, O., Natarajan, N., Ravikumar, P., & Dhillon, I. S. (2015). Consistent multilabel classification. Advances in Neural Information Processing Systems, 2015-January, 3321-3329.

Consistent multilabel classification. / Koyejo, Oluwasanmi; Natarajan, Nagarajan; Ravikumar, Pradeep; Dhillon, Inderjit S.

In: Advances in Neural Information Processing Systems, Vol. 2015-January, 01.01.2015, p. 3321-3329.

Research output: Contribution to journalConference article

Koyejo, O, Natarajan, N, Ravikumar, P & Dhillon, IS 2015, 'Consistent multilabel classification', Advances in Neural Information Processing Systems, vol. 2015-January, pp. 3321-3329.
Koyejo O, Natarajan N, Ravikumar P, Dhillon IS. Consistent multilabel classification. Advances in Neural Information Processing Systems. 2015 Jan 1;2015-January:3321-3329.
Koyejo, Oluwasanmi ; Natarajan, Nagarajan ; Ravikumar, Pradeep ; Dhillon, Inderjit S. / Consistent multilabel classification. In: Advances in Neural Information Processing Systems. 2015 ; Vol. 2015-January. pp. 3321-3329.
@article{46304c4730c644c09ee9377c5de18a7f,
title = "Consistent multilabel classification",
abstract = "Multilabel classification is rapidly developing as an important aspect of modern predictive modeling, motivating study of its theoretical aspects. To this end, we propose a framework for constructing and analyzing multilabel classification metrics which reveals novel results on a parametric form for population optimal classifiers, and additional insight into the role of label correlations. In particular, we show that for multilabel metrics constructed as instance-, micro- and macroaverages, the population optimal classifier can be decomposed into binary classifiers based on the marginal instance-conditional distribution of each label, with a weak association between labels via the threshold. Thus, our analysis extends the state of the art from a few known multilabel classification metrics such as Hamming loss, to a general framework applicable to many of the classification metrics in common use. Based on the population-optimal classifier, we propose a computationally efficient and general-purpose plug-in classification algorithm, and prove its consistency with respect to the metric of interest. Empirical results on synthetic and benchmark datasets are supportive of our theoretical findings.",
author = "Oluwasanmi Koyejo and Nagarajan Natarajan and Pradeep Ravikumar and Dhillon, {Inderjit S.}",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
volume = "2015-January",
pages = "3321--3329",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Consistent multilabel classification

AU - Koyejo, Oluwasanmi

AU - Natarajan, Nagarajan

AU - Ravikumar, Pradeep

AU - Dhillon, Inderjit S.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Multilabel classification is rapidly developing as an important aspect of modern predictive modeling, motivating study of its theoretical aspects. To this end, we propose a framework for constructing and analyzing multilabel classification metrics which reveals novel results on a parametric form for population optimal classifiers, and additional insight into the role of label correlations. In particular, we show that for multilabel metrics constructed as instance-, micro- and macroaverages, the population optimal classifier can be decomposed into binary classifiers based on the marginal instance-conditional distribution of each label, with a weak association between labels via the threshold. Thus, our analysis extends the state of the art from a few known multilabel classification metrics such as Hamming loss, to a general framework applicable to many of the classification metrics in common use. Based on the population-optimal classifier, we propose a computationally efficient and general-purpose plug-in classification algorithm, and prove its consistency with respect to the metric of interest. Empirical results on synthetic and benchmark datasets are supportive of our theoretical findings.

AB - Multilabel classification is rapidly developing as an important aspect of modern predictive modeling, motivating study of its theoretical aspects. To this end, we propose a framework for constructing and analyzing multilabel classification metrics which reveals novel results on a parametric form for population optimal classifiers, and additional insight into the role of label correlations. In particular, we show that for multilabel metrics constructed as instance-, micro- and macroaverages, the population optimal classifier can be decomposed into binary classifiers based on the marginal instance-conditional distribution of each label, with a weak association between labels via the threshold. Thus, our analysis extends the state of the art from a few known multilabel classification metrics such as Hamming loss, to a general framework applicable to many of the classification metrics in common use. Based on the population-optimal classifier, we propose a computationally efficient and general-purpose plug-in classification algorithm, and prove its consistency with respect to the metric of interest. Empirical results on synthetic and benchmark datasets are supportive of our theoretical findings.

UR - http://www.scopus.com/inward/record.url?scp=84965160543&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84965160543&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84965160543

VL - 2015-January

SP - 3321

EP - 3329

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -