YACC: A Framework Generalizing TuránShadow for Counting Large Cliques

Shweta Jain, Hanghang Tong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clique-counting is a fundamental problem that has application in many areas eg. dense subgraph discovery, community detection, spam detection, etc. The problem of k-clique-counting is difficult because as k increases, the number of k-cliques goes up exponentially. Enumeration algorithms (even parallel ones) fail to count k-cliques beyond a small k. Approximation algorithms, like TuránShadow have been shown to perform well upto k = 10, but are inefficient for larger cliques. The recently proposed Pivoter algorithm significantly improved the state-of-the-art and was able to give exact counts of all k-cliques in a large number of graphs. However, the clique counts of some graphs (for example, com-lj) are still out of reach of these algorithms. We revisit the TuránShadow algorithm and propose a generalized framework called YACC that leverages several insights about real-world graphs to achieve faster clique-counting. The bottleneck in TuránShadow is a recursive subroutine whose stopping condition is based on a classic result from extremal combinatorics called Turán’s theorem. This theorem gives a lower bound for the k-clique density in a subgraph in terms of its edge density. However, this stopping condition is based on a worst-case graph that does not reflect the nature of real-world graphs. Using techniques for quickly discovering dense subgraphs, we relax the stopping condition in a systematic way such that we get a smaller recursion tree while still maintaining the guarantees provided by TuránShadow. We deploy our algorithm on several real-world data sets and show that YACC reduces the size of the recursion tree and the running time by over an order of magnitude. Using YACC, we are able to obtain clique counts for several graphs for which clique-counting was infeasible before, including com-lj.

Original languageEnglish (US)
Title of host publicationProceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022
PublisherSociety for Industrial and Applied Mathematics Publications
Pages684-692
Number of pages9
ISBN (Electronic)9781611977172
StatePublished - 2022
Event2022 SIAM International Conference on Data Mining, SDM 2022 - Virtual, Online
Duration: Apr 28 2022Apr 30 2022

Publication series

NameProceedings of the 2022 SIAM International Conference on Data Mining, SDM 2022

Conference

Conference2022 SIAM International Conference on Data Mining, SDM 2022
CityVirtual, Online
Period4/28/224/30/22

Keywords

  • Cliques
  • Degeneracy
  • Sampling
  • TuránShadow

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'YACC: A Framework Generalizing TuránShadow for Counting Large Cliques'. Together they form a unique fingerprint.

Cite this