Selective block minimization for faster convergence of limited memory large-scale linear models

Kai Wei Chang, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the size of data sets used to build classifiers steadily increases, training a linear model efficiently with limited memory becomes essential. Several techniques deal with this problem by loading blocks of data from disk one at a time, but usually take a considerable number of iterations to converge to a reasonable model. Even the best block minimization techniques [1] require many block loads since they treat all training examples uniformly. As disk I/O is expensive, reducing the amount of disk access can dramatically decrease the training time. This paper introduces a selective block minimization (SBM) algorithm, a block minimization method that makes use of selective sampling. At each step, SBM updates the model using data consisting of two parts: (1) new data loaded from disk and (2) a set of informative samples already in memory from previous steps. We prove that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data. Experiments show that the SBM algorithm dramatically reduces the number of blocks loaded from disk and consequently obtains an accurate and stable model quickly on both binary and multi-class classification.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'11
PublisherAssociation for Computing Machinery
Pages699-707
Number of pages9
ISBN (Print)9781450308137
DOIs
StatePublished - 2011
Externally publishedYes
Event17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011 - San Diego, United States
Duration: Aug 21 2011Aug 24 2011

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011
Country/TerritoryUnited States
CitySan Diego
Period8/21/118/24/11

Keywords

  • Algorithms
  • Experimentation
  • Performance

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Selective block minimization for faster convergence of limited memory large-scale linear models'. Together they form a unique fingerprint.

Cite this