TY - GEN
T1 - Optimized disjunctive association rules via sampling
AU - Elble, J.
AU - Heeren, C.
AU - Pitt, L.
PY - 2003
Y1 - 2003
N2 - The problem of finding optimized support association rules for a single numerical attribute, where the optimized region is a union of k disjoint intervals from the range of the attribute, is investigated. The first polynomial time algorithm for the problem of finding such a region maximizing support and meeting a minimum cumulative confidence threshold is given. Because the algorithm is not practical, an ostensibly easier, more constrained version of the problem is considered. Experiments demonstrate that the best extant algorithm for the constrained version has significant performance degradation on both a synthetic model of patterned data and on real world data sets. Running the algorithm on a small random sample is proposed as a means of obtaining near optimal results with high probability. Theoretical bounds on sufficient sample size to achieve a given performance level are proved, and rapid convergence on synthetic and real-world data is validated experimentally.
AB - The problem of finding optimized support association rules for a single numerical attribute, where the optimized region is a union of k disjoint intervals from the range of the attribute, is investigated. The first polynomial time algorithm for the problem of finding such a region maximizing support and meeting a minimum cumulative confidence threshold is given. Because the algorithm is not practical, an ostensibly easier, more constrained version of the problem is considered. Experiments demonstrate that the best extant algorithm for the constrained version has significant performance degradation on both a synthetic model of patterned data and on real world data sets. Running the algorithm on a small random sample is proposed as a means of obtaining near optimal results with high probability. Theoretical bounds on sufficient sample size to achieve a given performance level are proved, and rapid convergence on synthetic and real-world data is validated experimentally.
UR - http://www.scopus.com/inward/record.url?scp=34547770503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547770503&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:34547770503
SN - 0769519784
SN - 9780769519784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 43
EP - 50
BT - Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
T2 - 3rd IEEE International Conference on Data Mining, ICDM '03
Y2 - 19 November 2003 through 22 November 2003
ER -