Provable tensor factorization with missing data

Prateek Jain, Sewoong Oh

Research output: Contribution to journalConference article

Abstract

We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode n × n × n dimensional rank-r tensor exactly from O(n3/2r5 log4 n) randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemerédie et al. on the spectrum of random graphs. We show that this initialization step alone is sufficient to achieve the root mean squared error on the parameters bounded by C(r2n3/2(log n)4|Ω|) from |Ω| observed entries for some constant C independent of n and r. Next, we prove global convergence of alternating minimization with this good initialization. Simulations suggest that the dependence of the sample size on the dimensionality n is indeed tight.

Original languageEnglish (US)
Pages (from-to)1431-1439
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume2
Issue numberJanuary
StatePublished - Jan 1 2014
Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
Duration: Dec 8 2014Dec 13 2014

Fingerprint

Factorization
Tensors
Decomposition

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Provable tensor factorization with missing data. / Jain, Prateek; Oh, Sewoong.

In: Advances in Neural Information Processing Systems, Vol. 2, No. January, 01.01.2014, p. 1431-1439.

Research output: Contribution to journalConference article

Jain, Prateek ; Oh, Sewoong. / Provable tensor factorization with missing data. In: Advances in Neural Information Processing Systems. 2014 ; Vol. 2, No. January. pp. 1431-1439.
@article{4e26545057204db28b9fa30c4c422292,
title = "Provable tensor factorization with missing data",
abstract = "We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode n × n × n dimensional rank-r tensor exactly from O(n3/2r5 log4 n) randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemer{\'e}die et al. on the spectrum of random graphs. We show that this initialization step alone is sufficient to achieve the root mean squared error on the parameters bounded by C(r2n3/2(log n)4|Ω|) from |Ω| observed entries for some constant C independent of n and r. Next, we prove global convergence of alternating minimization with this good initialization. Simulations suggest that the dependence of the sample size on the dimensionality n is indeed tight.",
author = "Prateek Jain and Sewoong Oh",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
volume = "2",
pages = "1431--1439",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",
number = "January",

}

TY - JOUR

T1 - Provable tensor factorization with missing data

AU - Jain, Prateek

AU - Oh, Sewoong

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode n × n × n dimensional rank-r tensor exactly from O(n3/2r5 log4 n) randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemerédie et al. on the spectrum of random graphs. We show that this initialization step alone is sufficient to achieve the root mean squared error on the parameters bounded by C(r2n3/2(log n)4|Ω|) from |Ω| observed entries for some constant C independent of n and r. Next, we prove global convergence of alternating minimization with this good initialization. Simulations suggest that the dependence of the sample size on the dimensionality n is indeed tight.

AB - We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode n × n × n dimensional rank-r tensor exactly from O(n3/2r5 log4 n) randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in analyzing the initialization step, we prove a generalization of a celebrated result by Szemerédie et al. on the spectrum of random graphs. We show that this initialization step alone is sufficient to achieve the root mean squared error on the parameters bounded by C(r2n3/2(log n)4|Ω|) from |Ω| observed entries for some constant C independent of n and r. Next, we prove global convergence of alternating minimization with this good initialization. Simulations suggest that the dependence of the sample size on the dimensionality n is indeed tight.

UR - http://www.scopus.com/inward/record.url?scp=84937903661&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937903661&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84937903661

VL - 2

SP - 1431

EP - 1439

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

IS - January

ER -