TY - GEN
T1 - Semantic indexing for a complete subject discipline
AU - Chung, Yi Ming
AU - He, Qin
AU - Powell, Kevin
AU - Schatz, Bruce
PY - 1999
Y1 - 1999
N2 - As part of the Illinois Digital Library Initiative (DLI) project we developed `scalable semantics' technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses `semantic indexing' as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3 M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositories) by subject, extracted noun phrases from titles and abstracts, then performed semantic indexing on these sub-collections by creating a concept space for each repository. The computation required 2 days on a 128-node SGI/CRAY Origin 2000 at the National Center for Supercomputer Applications (NCSA). This experiment demonstrated the feasibility of scalable semantics techniques for large collections. With the rapid increase in computing power, we believe this indexing technology will shortly be feasible on personal computers.
AB - As part of the Illinois Digital Library Initiative (DLI) project we developed `scalable semantics' technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses `semantic indexing' as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3 M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositories) by subject, extracted noun phrases from titles and abstracts, then performed semantic indexing on these sub-collections by creating a concept space for each repository. The computation required 2 days on a 128-node SGI/CRAY Origin 2000 at the National Center for Supercomputer Applications (NCSA). This experiment demonstrated the feasibility of scalable semantics techniques for large collections. With the rapid increase in computing power, we believe this indexing technology will shortly be feasible on personal computers.
UR - http://www.scopus.com/inward/record.url?scp=0033279584&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033279584&partnerID=8YFLogxK
U2 - 10.1145/313238.313253
DO - 10.1145/313238.313253
M3 - Conference contribution
AN - SCOPUS:0033279584
SN - 1581131453
SN - 9781581131451
T3 - Proceedings of the ACM International Conference on Digital Libraries
SP - 39
EP - 48
BT - Proceedings of the ACM International Conference on Digital Libraries
PB - ACM
T2 - Proceedings of the 1999 4th ACM International Conference on Digital Libraries (DL'99)
Y2 - 11 August 1999 through 14 August 1999
ER -