TY - GEN
T1 - Incorporating social context and domain knowledge for entity recognition
AU - Tang, Jie
AU - Fang, Zhanpeng
AU - Sun, Jimeng
PY - 2015/5/18
Y1 - 2015/5/18
N2 - Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances? In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution. We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; p ≥ 1e-5 with t-Test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.
AB - Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances? In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution. We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; p ≥ 1e-5 with t-Test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.
KW - Instance recognition
KW - Probabilistic model
KW - Social network
UR - https://www.scopus.com/pages/publications/84968914455
UR - https://www.scopus.com/pages/publications/84968914455#tab=citedBy
U2 - 10.1145/2736277.2741135
DO - 10.1145/2736277.2741135
M3 - Conference contribution
AN - SCOPUS:84968914455
T3 - WWW 2015 - Proceedings of the 24th International Conference on World Wide Web
SP - 517
EP - 526
BT - WWW 2015 - Proceedings of the 24th International Conference on World Wide Web
PB - Association for Computing Machinery
T2 - 24th International Conference on World Wide Web, WWW 2015
Y2 - 18 May 2015 through 22 May 2015
ER -