Abstract
Twitter, a microblogging platform, has developed into an increasingly invaluable information source, where millions of users post a great quantity of tweets with various topics per day. Heterogeneous information networks consisting of multi-type objects and relations are becoming more and more prevalent as an organization form of knowledge and information. The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets. However, the entity mention is ambiguous. Additionally, tweets are short and informal, making it difficult to mine enough information from a single tweet for entity linking. In this paper, we propose an unsupervised iterative clustering framework TELHIN to link multiple similar tweets with a heterogeneous information network jointly. Our framework takes three dimensions of tweet similarity into consideration: (1) content similarity, (2) temporal similarity, and (3) user similarity. The appropriate weights of different similarity dimensions for each entity mention are learned iteratively based on the metric learning algorithm by leveraging the pairwise constraints generated automatically. Experiments on real data demonstrate the effectiveness of our framework in comparison with the baselines.
Original language | English (US) |
---|---|
Pages (from-to) | 6003-6017 |
Number of pages | 15 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 34 |
Issue number | 12 |
DOIs | |
State | Published - Dec 1 2022 |
Keywords
- Tweet entity linking
- heterogeneous information networks
- iterative clustering
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics