TY - GEN
T1 - Partitioning social networks for time-dependent queries
AU - Carrasco, Berenice
AU - Lu, Yi
AU - Da Trindade, Joana M.F.
PY - 2011
Y1 - 2011
N2 - The most common type of queries in online social networks are news feeds of friends' recent activities. These queries involve the retrieval of multiple small records generated by different users in the network, and the results are time dependent. Hash-based horizontal partitioning of data results in accesses at multiple servers, which significantly affects throughput and response time. Partitioning of social network data is difficult because of the power-law degree distribution of the friendship graph, and the time dependency of queries and user activities. The power-law degree distribution results in a tremendous amount of extra storage for replication-based partitions, and the time dependency makes query-driven partitioning ineffective. We propose to partition not only the spatial network of social relations, but also in the time dimension so that users who have communicated in a given period are grouped together. We build an activity prediction graph to capture relationships with strong activity and serve as the basis for partitioning. New nodes occurring in the current period are added greedily. We test the partitioning results with emulation of Facebook page downloads, and show that our algorithm achieves 10 times better data locality than hash-based horizontal partitioning algorithms. We show the quality of activity prediction by observing that the algorithm with prediction achieves 80% data locality of that with perfect knowledge of the current period.
AB - The most common type of queries in online social networks are news feeds of friends' recent activities. These queries involve the retrieval of multiple small records generated by different users in the network, and the results are time dependent. Hash-based horizontal partitioning of data results in accesses at multiple servers, which significantly affects throughput and response time. Partitioning of social network data is difficult because of the power-law degree distribution of the friendship graph, and the time dependency of queries and user activities. The power-law degree distribution results in a tremendous amount of extra storage for replication-based partitions, and the time dependency makes query-driven partitioning ineffective. We propose to partition not only the spatial network of social relations, but also in the time dimension so that users who have communicated in a given period are grouped together. We build an activity prediction graph to capture relationships with strong activity and serve as the basis for partitioning. New nodes occurring in the current period are added greedily. We test the partitioning results with emulation of Facebook page downloads, and show that our algorithm achieves 10 times better data locality than hash-based horizontal partitioning algorithms. We show the quality of activity prediction by observing that the algorithm with prediction achieves 80% data locality of that with perfect knowledge of the current period.
UR - http://www.scopus.com/inward/record.url?scp=79959862439&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959862439&partnerID=8YFLogxK
U2 - 10.1145/1989656.1989658
DO - 10.1145/1989656.1989658
M3 - Conference contribution
AN - SCOPUS:79959862439
SN - 9781450307284
T3 - Proceedings of the 4th Workshop on Social Network Systems, SNS'11
BT - Proceedings of the 4th Workshop on Social Network Systems, SNS'11
T2 - 4th Workshop on Social Network Systems, SNS'11
Y2 - 10 April 2011 through 13 April 2011
ER -