TY - GEN
T1 - CatchTartan
T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
AU - Jiang, Meng
AU - Faloutsos, Christos
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/8/13
Y1 - 2016/8/13
N2 - Representing and summarizing human behaviors with rich contexts facilitates behavioral sciences and user-oriented services. Traditional behavioral modeling represents a behavior as a tuple in which each element is one contextual factor of one type, and the tensorbased summaries look for high-order dense blocks by clustering the values (including timestamps) in each dimension. However, the human behaviors are multicontextual and dynamic: (1) each behavior takes place within multiple contexts in a few dimensions, which requires the representation to enable non-value and set-values for each dimension; (2) many behavior collections, such as tweets or papers, evolve over time. In this paper, we represent the behavioral data as a two-level matrix (temporal-behaviors by dimensional values) and propose a novel representation for behavioral summary called Tartan that includes a set of dimensions, the values in each dimension, a list of consecutive time slices and the behaviors in each slice. We further develop a propagation method CATCHTARTAN to catch the dynamic multicontextual patterns from the temporal multidimensional data in a principled and scalable way: it determines the meaningfulness of updating every element in the Tartan by minimizing the encoding cost in a compression manner. CATCHTARTAN outperforms the baselines on both the accuracy and speed. We apply CATCHTARTAN to four Twitter datasets up to 10 million tweets and the DBLP data, providing comprehensive summaries for the events, human life and scientific development.
AB - Representing and summarizing human behaviors with rich contexts facilitates behavioral sciences and user-oriented services. Traditional behavioral modeling represents a behavior as a tuple in which each element is one contextual factor of one type, and the tensorbased summaries look for high-order dense blocks by clustering the values (including timestamps) in each dimension. However, the human behaviors are multicontextual and dynamic: (1) each behavior takes place within multiple contexts in a few dimensions, which requires the representation to enable non-value and set-values for each dimension; (2) many behavior collections, such as tweets or papers, evolve over time. In this paper, we represent the behavioral data as a two-level matrix (temporal-behaviors by dimensional values) and propose a novel representation for behavioral summary called Tartan that includes a set of dimensions, the values in each dimension, a list of consecutive time slices and the behaviors in each slice. We further develop a propagation method CATCHTARTAN to catch the dynamic multicontextual patterns from the temporal multidimensional data in a principled and scalable way: it determines the meaningfulness of updating every element in the Tartan by minimizing the encoding cost in a compression manner. CATCHTARTAN outperforms the baselines on both the accuracy and speed. We apply CATCHTARTAN to four Twitter datasets up to 10 million tweets and the DBLP data, providing comprehensive summaries for the events, human life and scientific development.
KW - Behavior representation
KW - Behavior summarization
KW - Minimum description length
UR - http://www.scopus.com/inward/record.url?scp=84985028711&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84985028711&partnerID=8YFLogxK
U2 - 10.1145/2939672.2939749
DO - 10.1145/2939672.2939749
M3 - Conference contribution
AN - SCOPUS:84985028711
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 945
EP - 954
BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 13 August 2016 through 17 August 2016
ER -