TY - JOUR

T1 - Joint fixed-rate universal lossy coding and identification of continuous-alphabet memoryless sources

AU - Raginsky, Maxim

N1 - Funding Information:
Manuscript received December 3, 2005; revised May 17, 2007. This work was supported by the Beckman Institute Fellowship. The material in this paper was presented in part at the IEEE International Symposium on Information Theory, Seattle, WA, July 2006.

PY - 2008/7

Y1 - 2008/7

N2 - The problem of joint universal source coding and density estimation is considered in the setting of fixed-rate lossy coding of continuous-alphabet memoryless sources. For a wide class of bounded distortion measures, it is shown that any compactly parametrized family of ℝd-valued independent and identically distributed (i.i.d.) sources with absolutely continuous distributions satisfying appropriate smoothness and Vapnik-Chervonenkis (VC) learnability conditions, admits a joint scheme for universal lossy block coding and parameter estimation, such that when the block length n tends to infinity, the overhead per-letter rate and the distortion redundancies converge to zero as O(n-1log n) and O(√n-1log n), respectively. Moreover, the active source can be determined at the decoder up to a ball of radius O(√n-1log n) in variational distance, asymptotically almost surely. The system has finite memory length equal to the block length, and can be thought of as blockwise application of a time-invariant nonlinear filter with initial conditions determined from the previous block. Comparisons are presented with several existing schemes for universal vector quantization, which do not include parameter estimation explicitly, and an extension to unbounded distortion measures is outlined. Finally, finite mixture classes and exponential families are given as explicit examples of parametric sources admitting joint universal compression and modeling schemes of the kind studied here.

AB - The problem of joint universal source coding and density estimation is considered in the setting of fixed-rate lossy coding of continuous-alphabet memoryless sources. For a wide class of bounded distortion measures, it is shown that any compactly parametrized family of ℝd-valued independent and identically distributed (i.i.d.) sources with absolutely continuous distributions satisfying appropriate smoothness and Vapnik-Chervonenkis (VC) learnability conditions, admits a joint scheme for universal lossy block coding and parameter estimation, such that when the block length n tends to infinity, the overhead per-letter rate and the distortion redundancies converge to zero as O(n-1log n) and O(√n-1log n), respectively. Moreover, the active source can be determined at the decoder up to a ball of radius O(√n-1log n) in variational distance, asymptotically almost surely. The system has finite memory length equal to the block length, and can be thought of as blockwise application of a time-invariant nonlinear filter with initial conditions determined from the previous block. Comparisons are presented with several existing schemes for universal vector quantization, which do not include parameter estimation explicitly, and an extension to unbounded distortion measures is outlined. Finally, finite mixture classes and exponential families are given as explicit examples of parametric sources admitting joint universal compression and modeling schemes of the kind studied here.

KW - Learning

KW - Minimum-distance density estimation

KW - Two-stage codes

KW - Universal vector quantization

KW - Vapnik-Chervonenkis (VC) dimension

UR - http://www.scopus.com/inward/record.url?scp=46849122718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46849122718&partnerID=8YFLogxK

U2 - 10.1109/TIT.2008.924669

DO - 10.1109/TIT.2008.924669

M3 - Article

AN - SCOPUS:46849122718

SN - 0018-9448

VL - 54

SP - 3059

EP - 3077

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

IS - 7

ER -