TY - GEN
T1 - Better rules, fewer features
T2 - 1st IEEE International Conference on Data Mining, ICDM'01
AU - Blake, Catherine
AU - Pratt, Wanda
PY - 2001
Y1 - 2001
N2 - The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model. We explored this relationship for medical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts. Our preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features. The concept and keyword representations also required 90% fewer features than the word representation. This drastic dimensionality reduction suggests that this approach is well suited to large textual corpus of medical text, such as parts of the Web.
AB - The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model. We explored this relationship for medical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts. Our preliminary findings indicate that bi-directional association rules based on concepts or keywords are more plausible and more useful than those based on word features. The concept and keyword representations also required 90% fewer features than the word representation. This drastic dimensionality reduction suggests that this approach is well suited to large textual corpus of medical text, such as parts of the Web.
UR - http://www.scopus.com/inward/record.url?scp=56549087736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56549087736&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:56549087736
SN - 0769511198
SN - 9780769511191
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 59
EP - 66
BT - Proceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Y2 - 29 November 2001 through 2 December 2001
ER -