Abstract
We consider the tasks of testing and spot-checking for data streams. These testers and spot-checkers are potentially useful in real-time or near real-time applications that process huge data sets. Crucial aspects of the computational model include the space complexity of the testers and spot-checkers (ideally much lower than the size of the input stream) and the number of passes that the tester or spot-checker must make over the input stream (ideally one, because the original stream may be too large to store for a second pass). A sampling-tester [GGR] for a property P samples some (but usually not all) of its input and, with high probability, outputs PASS if the input has property P and FAIL if the input is far from having P, for an appropriate sense of "far." A streaming-tester for a property P of one or more input streams takes as input one or more data streams and, with high probability, outputs PASS if the streams have property P and FAIL if the streams are far from having P: A sampling-tester can make its samples in any order; a streaming-tester sees the input from left to right. We consider the groupedness properly (a natural relaxation of the sortedness property). We also revisit the sortedness property, first considered in [EKK+] in the context of sampling spot-checkers, and the property of detecting whether one stream is a permutation of another (either directly or via the SORTED-SUPERSET property, a technical property that is equivalent to PERMUTATION under some conditions). We show that there are properties efficiently testable by a streaming-tester but not by a sampling-tester and other (promise) problems for which the reverse is true.
Original language | English (US) |
---|---|
Pages (from-to) | 67-80 |
Number of pages | 14 |
Journal | Algorithmica (New York) |
Volume | 34 |
Issue number | 1 |
DOIs | |
State | Published - 2002 |
Keywords
- Data streams
- Massive data sets
- Property testing
- Spot-checking
ASJC Scopus subject areas
- General Computer Science
- Computer Science Applications
- Applied Mathematics