TY - GEN
T1 - Performance and implications of semantic indexing in a distributed environment
AU - Chang, Conrad T.K.
AU - Schatz, Bruce R.
PY - 1999
Y1 - 1999
N2 - A research prototype is presented for semantic indexing and retrieval in Information Retrieval. The prototype is motivated by a desire to provide a more efficient and effective information retrieval system compared to the current state of the art. An overview of the Interspace architecture layers is discussed. An object model supporting semantic operations is developed. The model contains a rich set of classes and relationships of the data for the semantic indexing module. The basis of our semantic indexing is done by the creation of concept space. A concept space is an index of a collection that uses document statistics to capture the relationships between concepts. It is useful for boosting text search, by term suggestion of alternative terms semantically related to query terms. Over the years, we have developed generic technology for concept spaces computation on large collections across many subjects. Recent computations on discipline-scale collections have been made on high-end supercomputers. This paper describes our implementation and implications of the computation in a distributed computing environment. Experimental results using different collection sizes and number of processes are presented to show the feasibility of this approach. We also show that laboratory and community collections are already easily computable using a group of PCs in a lab via a message-passing model. We conclude that PC clusters will shortly be able to compute semantic indexes for any real collections.
AB - A research prototype is presented for semantic indexing and retrieval in Information Retrieval. The prototype is motivated by a desire to provide a more efficient and effective information retrieval system compared to the current state of the art. An overview of the Interspace architecture layers is discussed. An object model supporting semantic operations is developed. The model contains a rich set of classes and relationships of the data for the semantic indexing module. The basis of our semantic indexing is done by the creation of concept space. A concept space is an index of a collection that uses document statistics to capture the relationships between concepts. It is useful for boosting text search, by term suggestion of alternative terms semantically related to query terms. Over the years, we have developed generic technology for concept spaces computation on large collections across many subjects. Recent computations on discipline-scale collections have been made on high-end supercomputers. This paper describes our implementation and implications of the computation in a distributed computing environment. Experimental results using different collection sizes and number of processes are presented to show the feasibility of this approach. We also show that laboratory and community collections are already easily computable using a group of PCs in a lab via a message-passing model. We conclude that PC clusters will shortly be able to compute semantic indexes for any real collections.
UR - http://www.scopus.com/inward/record.url?scp=0033279308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033279308&partnerID=8YFLogxK
U2 - 10.1145/319950.320032
DO - 10.1145/319950.320032
M3 - Conference contribution
AN - SCOPUS:0033279308
SN - 1581131461
SN - 9781581131468
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 391
EP - 398
BT - International Conference on Information and Knowledge Management, Proceedings
PB - ACM
T2 - Proceedings of the 1999 8th International Conference on Information Knowledge Management (CIKM'99)
Y2 - 2 November 1999 through 6 November 1999
ER -