### Abstract

Given training sequences generated by two distinct, but unknown distributions sharing a common alphabet, we seek a classifier that can correctly decide whether a third test sequence is generated by the first or second distribution using only the training data. To model 'limited learning' we allow the alphabet size to grow and therefore probability distributions to change with the blocklength. We prove that a natural choice, namely a generalized likelihood ratio test, is universally consistent (has a probability of error tending to zero with the blocklength for all underlying distributions) when the alphabet size is sub-linear in the blocklength, but inconsistent for linear alphabet growth. For up-to quadratic alphabet growth, in a regime where all probabilities are of the same order, we prove the universally consistency of a new test and show there are no such tests when the alphabet grows quadratically or faster.

Original language | English (US) |
---|---|

Title of host publication | 2010 IEEE International Symposium on Information Theory, ISIT 2010 - Proceedings |

Pages | 1478-1482 |

Number of pages | 5 |

DOIs | |

State | Published - Aug 23 2010 |

Event | 2010 IEEE International Symposium on Information Theory, ISIT 2010 - Austin, TX, United States Duration: Jun 13 2010 → Jun 18 2010 |

### Publication series

Name | IEEE International Symposium on Information Theory - Proceedings |
---|---|

ISSN (Print) | 2157-8103 |

### Other

Other | 2010 IEEE International Symposium on Information Theory, ISIT 2010 |
---|---|

Country | United States |

City | Austin, TX |

Period | 6/13/10 → 6/18/10 |

### Fingerprint

### ASJC Scopus subject areas

- Theoretical Computer Science
- Information Systems
- Modeling and Simulation
- Applied Mathematics

### Cite this

*2010 IEEE International Symposium on Information Theory, ISIT 2010 - Proceedings*(pp. 1478-1482). [5513583] (IEEE International Symposium on Information Theory - Proceedings). https://doi.org/10.1109/ISIT.2010.5513583

**Universal hypothesis testing in the learning-limited regime.** / Kelly, Benjamin G.; Tularak, Thitidej; Wagner, Aaron B.; Viswanath, Pramod.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*2010 IEEE International Symposium on Information Theory, ISIT 2010 - Proceedings.*, 5513583, IEEE International Symposium on Information Theory - Proceedings, pp. 1478-1482, 2010 IEEE International Symposium on Information Theory, ISIT 2010, Austin, TX, United States, 6/13/10. https://doi.org/10.1109/ISIT.2010.5513583

}

TY - GEN

T1 - Universal hypothesis testing in the learning-limited regime

AU - Kelly, Benjamin G.

AU - Tularak, Thitidej

AU - Wagner, Aaron B.

AU - Viswanath, Pramod

PY - 2010/8/23

Y1 - 2010/8/23

N2 - Given training sequences generated by two distinct, but unknown distributions sharing a common alphabet, we seek a classifier that can correctly decide whether a third test sequence is generated by the first or second distribution using only the training data. To model 'limited learning' we allow the alphabet size to grow and therefore probability distributions to change with the blocklength. We prove that a natural choice, namely a generalized likelihood ratio test, is universally consistent (has a probability of error tending to zero with the blocklength for all underlying distributions) when the alphabet size is sub-linear in the blocklength, but inconsistent for linear alphabet growth. For up-to quadratic alphabet growth, in a regime where all probabilities are of the same order, we prove the universally consistency of a new test and show there are no such tests when the alphabet grows quadratically or faster.

AB - Given training sequences generated by two distinct, but unknown distributions sharing a common alphabet, we seek a classifier that can correctly decide whether a third test sequence is generated by the first or second distribution using only the training data. To model 'limited learning' we allow the alphabet size to grow and therefore probability distributions to change with the blocklength. We prove that a natural choice, namely a generalized likelihood ratio test, is universally consistent (has a probability of error tending to zero with the blocklength for all underlying distributions) when the alphabet size is sub-linear in the blocklength, but inconsistent for linear alphabet growth. For up-to quadratic alphabet growth, in a regime where all probabilities are of the same order, we prove the universally consistency of a new test and show there are no such tests when the alphabet grows quadratically or faster.

UR - http://www.scopus.com/inward/record.url?scp=77955700822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955700822&partnerID=8YFLogxK

U2 - 10.1109/ISIT.2010.5513583

DO - 10.1109/ISIT.2010.5513583

M3 - Conference contribution

AN - SCOPUS:77955700822

SN - 9781424469604

T3 - IEEE International Symposium on Information Theory - Proceedings

SP - 1478

EP - 1482

BT - 2010 IEEE International Symposium on Information Theory, ISIT 2010 - Proceedings

ER -