Testing and spot-checking of data streams

J. Feigenbaum, S. Kannan, M. Strauss, Mahesh Viswanathan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the tasks of testing and spot-checking for data streams. These testers and spot-checkers are potentially useful in real-time or near real-time applications that process huge datasets. Crucial aspects of the computational model include the space complexity of the testers and spot-checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot-checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass). A sampling tester [GGR98] for a property P samples some (but usually not all) of its input and, with high probability; outputs PASS if the input has property P and FAIL if the input is far from having P, for an appropriate sense of `far.' A streaming-tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far from having P. A sampling-tester can make its samples in any order; a streaming-tester sees the input from left to right. We consider the groupedness property (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [EKK+98] in the context of sampling spot-checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED-SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming-tester but not by a sampling-tester and other (promise) problems for which the reverse is true.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
PublisherSIAM
Pages165-174
Number of pages10
StatePublished - 2000
Externally publishedYes
Event11th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, USA
Duration: Jan 9 2000Jan 11 2000

Other

Other11th Annual ACM-SIAM Symposium on Discrete Algorithms
CitySan Francisco, CA, USA
Period1/9/001/11/00

ASJC Scopus subject areas

  • Chemical Health and Safety
  • Software
  • Safety, Risk, Reliability and Quality
  • Discrete Mathematics and Combinatorics

Fingerprint Dive into the research topics of 'Testing and spot-checking of data streams'. Together they form a unique fingerprint.

Cite this