Testing and spot-checking of data streams

J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the tasks of testing and spot-checking for data streams. These testers and spot-checkers are potentially useful in real-time or near real-time applications that process huge data sets. Crucial aspects of the computational model include the space complexity of the testers and spot-checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot-checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass). A sampling-tester [GGR] for a property P samples some (but usually not all) of its input and, with high probability, outputs PASS if the input has property P and FAIL if the input is far from having P, for an appropriate sense of "far." A streaming-tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far from having P: A sampling-tester can make its samples in any order; a streaming-tester sees the input from left to right. We consider the groupedness properly (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [EKK+] in the context of sampling spot-checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED-SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming-tester but not by a sampling-tester and other (promise) problems for which the reverse is true.

Original languageEnglish (US)
Pages (from-to)67-80
Number of pages14
JournalAlgorithmica (New York)
Volume34
Issue number1
DOIs
StatePublished - Jan 1 2002

Keywords

  • Data streams
  • Massive data sets
  • Property testing
  • Spot-checking

ASJC Scopus subject areas

  • Computer Science(all)
  • Computer Science Applications
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Testing and spot-checking of data streams'. Together they form a unique fingerprint.

Cite this