TY - GEN
T1 - Entity-centric document filtering
T2 - 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
AU - Zhou, Mianwei
AU - Chang, Kevin Chen Chuan
PY - 2013
Y1 - 2013
N2 - This paper studies the entity-centric document filtering task - given an entity represented by its identification page (e.g., a Wikpedia page), how to correctly identify its relevant documents. In particular, we are interested in learning an entity-centric document filter based on a small number of training entities, and the filter can predict document relevance for a large set of unseen entities at query time. Towards characterizing the relevance of a document, the problem boils down to learning keyword importance for query entities. Since the same keyword will have very different importance for different entities, we abstract the entity-centric document filtering problem as a transfer learning problem, and the challenge becomes how to appropriately transfer the keyword importance learned from training entities to query entities. Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. To realize the idea of meta-feature-based feature mapping, we develop and contrast two different models, LinearMapping and BoostMapping. Experiments on three different datasets confirm the effectiveness of our proposed models, which show significant improvement compared with four state-of-the-art baseline methods.
AB - This paper studies the entity-centric document filtering task - given an entity represented by its identification page (e.g., a Wikpedia page), how to correctly identify its relevant documents. In particular, we are interested in learning an entity-centric document filter based on a small number of training entities, and the filter can predict document relevance for a large set of unseen entities at query time. Towards characterizing the relevance of a document, the problem boils down to learning keyword importance for query entities. Since the same keyword will have very different importance for different entities, we abstract the entity-centric document filtering problem as a transfer learning problem, and the challenge becomes how to appropriately transfer the keyword importance learned from training entities to query entities. Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. To realize the idea of meta-feature-based feature mapping, we develop and contrast two different models, LinearMapping and BoostMapping. Experiments on three different datasets confirm the effectiveness of our proposed models, which show significant improvement compared with four state-of-the-art baseline methods.
KW - Entity centric
KW - Feature mapping
KW - Meta feature
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=84889590873&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84889590873&partnerID=8YFLogxK
U2 - 10.1145/2505515.2505683
DO - 10.1145/2505515.2505683
M3 - Conference contribution
AN - SCOPUS:84889590873
SN - 9781450322638
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 119
EP - 128
BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Y2 - 27 October 2013 through 1 November 2013
ER -