AIM

AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION

Victoria Stodden, Xiaomian Wu, Vanessa Sochat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We introduce a structured and portable Abstraction for Improving Machine learning (AIM) to improve prediction outcomes and enable meaningful comparisons of ML pipelines. We implement AIM for a well-known acute leukemia classification problem using the Scientific Filesystem, enabling direct performance comparisons across a variety of classifiers. AIM provides three direct efficiency benefits: 1) the sources of performance differences between ML pipelines can identified at the algorithm implementation level as defined by the AIM, 2) improvements can be made to specific aspects of the pipeline and thus better understood, and 3) the reuse of these defined abstraction components across different pipelines is facilitated. When the AIM is defined at the outset of the prediction challenge, these benefits can come at minimal cost. We show these benefits by implementing AIM and the Scientific Filesystem on the well-known Golub AML/ALL cancer dataset.

Original languageEnglish (US)
Title of host publication2018 IEEE Data Science Workshop, DSW 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages150-154
Number of pages5
ISBN (Print)9781538644102
DOIs
StatePublished - Aug 17 2018
Event2018 IEEE Data Science Workshop, DSW 2018 - Lausanne, Switzerland
Duration: Jun 4 2018Jun 6 2018

Publication series

Name2018 IEEE Data Science Workshop, DSW 2018 - Proceedings

Other

Other2018 IEEE Data Science Workshop, DSW 2018
CountrySwitzerland
CityLausanne
Period6/4/186/6/18

Fingerprint

Learning systems
Machine Learning
Prediction
Pipelines
prediction
Leukemia
Performance Comparison
Abstraction
machine learning
cancer
Classification Problems
Acute
Classifiers
Reuse
Cancer
Classifier
cost
Costs
comparison

Keywords

  • Scientific Filesystem
  • containers
  • cyberinfrastructure
  • machine learning
  • programming abstraction
  • reproducible research

ASJC Scopus subject areas

  • Artificial Intelligence
  • Safety, Risk, Reliability and Quality
  • Water Science and Technology
  • Control and Optimization

Cite this

Stodden, V., Wu, X., & Sochat, V. (2018). AIM: AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION. In 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings (pp. 150-154). [8439914] (2018 IEEE Data Science Workshop, DSW 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSW.2018.8439914

AIM : AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION. / Stodden, Victoria; Wu, Xiaomian; Sochat, Vanessa.

2018 IEEE Data Science Workshop, DSW 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 150-154 8439914 (2018 IEEE Data Science Workshop, DSW 2018 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stodden, V, Wu, X & Sochat, V 2018, AIM: AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION. in 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings., 8439914, 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 150-154, 2018 IEEE Data Science Workshop, DSW 2018, Lausanne, Switzerland, 6/4/18. https://doi.org/10.1109/DSW.2018.8439914
Stodden V, Wu X, Sochat V. AIM: AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION. In 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 150-154. 8439914. (2018 IEEE Data Science Workshop, DSW 2018 - Proceedings). https://doi.org/10.1109/DSW.2018.8439914
Stodden, Victoria ; Wu, Xiaomian ; Sochat, Vanessa. / AIM : AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION. 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 150-154 (2018 IEEE Data Science Workshop, DSW 2018 - Proceedings).
@inproceedings{bfac439772da48b29ae839cbda909fc4,
title = "AIM: AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION",
abstract = "We introduce a structured and portable Abstraction for Improving Machine learning (AIM) to improve prediction outcomes and enable meaningful comparisons of ML pipelines. We implement AIM for a well-known acute leukemia classification problem using the Scientific Filesystem, enabling direct performance comparisons across a variety of classifiers. AIM provides three direct efficiency benefits: 1) the sources of performance differences between ML pipelines can identified at the algorithm implementation level as defined by the AIM, 2) improvements can be made to specific aspects of the pipeline and thus better understood, and 3) the reuse of these defined abstraction components across different pipelines is facilitated. When the AIM is defined at the outset of the prediction challenge, these benefits can come at minimal cost. We show these benefits by implementing AIM and the Scientific Filesystem on the well-known Golub AML/ALL cancer dataset.",
keywords = "Scientific Filesystem, containers, cyberinfrastructure, machine learning, programming abstraction, reproducible research",
author = "Victoria Stodden and Xiaomian Wu and Vanessa Sochat",
year = "2018",
month = "8",
day = "17",
doi = "10.1109/DSW.2018.8439914",
language = "English (US)",
isbn = "9781538644102",
series = "2018 IEEE Data Science Workshop, DSW 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "150--154",
booktitle = "2018 IEEE Data Science Workshop, DSW 2018 - Proceedings",
address = "United States",

}

TY - GEN

T1 - AIM

T2 - AN ABSTRACTION for IMPROVING MACHINE LEARNING PREDICTION

AU - Stodden, Victoria

AU - Wu, Xiaomian

AU - Sochat, Vanessa

PY - 2018/8/17

Y1 - 2018/8/17

N2 - We introduce a structured and portable Abstraction for Improving Machine learning (AIM) to improve prediction outcomes and enable meaningful comparisons of ML pipelines. We implement AIM for a well-known acute leukemia classification problem using the Scientific Filesystem, enabling direct performance comparisons across a variety of classifiers. AIM provides three direct efficiency benefits: 1) the sources of performance differences between ML pipelines can identified at the algorithm implementation level as defined by the AIM, 2) improvements can be made to specific aspects of the pipeline and thus better understood, and 3) the reuse of these defined abstraction components across different pipelines is facilitated. When the AIM is defined at the outset of the prediction challenge, these benefits can come at minimal cost. We show these benefits by implementing AIM and the Scientific Filesystem on the well-known Golub AML/ALL cancer dataset.

AB - We introduce a structured and portable Abstraction for Improving Machine learning (AIM) to improve prediction outcomes and enable meaningful comparisons of ML pipelines. We implement AIM for a well-known acute leukemia classification problem using the Scientific Filesystem, enabling direct performance comparisons across a variety of classifiers. AIM provides three direct efficiency benefits: 1) the sources of performance differences between ML pipelines can identified at the algorithm implementation level as defined by the AIM, 2) improvements can be made to specific aspects of the pipeline and thus better understood, and 3) the reuse of these defined abstraction components across different pipelines is facilitated. When the AIM is defined at the outset of the prediction challenge, these benefits can come at minimal cost. We show these benefits by implementing AIM and the Scientific Filesystem on the well-known Golub AML/ALL cancer dataset.

KW - Scientific Filesystem

KW - containers

KW - cyberinfrastructure

KW - machine learning

KW - programming abstraction

KW - reproducible research

UR - http://www.scopus.com/inward/record.url?scp=85053120198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053120198&partnerID=8YFLogxK

U2 - 10.1109/DSW.2018.8439914

DO - 10.1109/DSW.2018.8439914

M3 - Conference contribution

SN - 9781538644102

T3 - 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings

SP - 150

EP - 154

BT - 2018 IEEE Data Science Workshop, DSW 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -