## Abstract

The huge volume of emerging graph datasets has become a double-bladed sword for graph machine learning. On the one hand, it empowers the success of a myriad of graph neural networks (GNNs) with strong empirical performance. On the other hand, training modern graph neural networks on huge graph data is computationally expensive. How to distill the given graph dataset while retaining most of the trained models' performance is a challenging problem. Existing efforts try to approach this problem by solving meta-learning-based bilevel optimization objectives. A major hurdle lies in that the exact solutions of these methods are computationally intensive and thus, most, if not all, of them are solved by approximate strategies which in turn hurt the distillation performance. In this paper, inspired by the recent advances in neural network kernel methods, we adopt a kernel ridge regression-based meta-learning objective which has a feasible exact solution. However, the computation of graph neural tangent kernel is very expensive, especially in the context of dataset distillation. As a response, we design a graph kernel, named LiteGNTK, tailored for the dataset distillation problem which is closely related to the classic random walk graph kernel. An effective model named Kernel rldge regression-based graph Dataset Distillation (KIDD) and its variants are proposed. KIDD shows nice efficiency in both the forward and backward propagation processes. At the same time, KIDD shows strong empirical performance over 7 real-world datasets compared with the state-of-the-art distillation methods. Thanks to the ability to find the exact solution of the distillation objective, the learned training graphs by KIDD can sometimes even outperform the original whole training set with as few as 1.65% training graphs.

Original language | English (US) |
---|---|

Title of host publication | KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining |

Publisher | Association for Computing Machinery |

Pages | 2850-2861 |

Number of pages | 12 |

ISBN (Electronic) | 9798400701030 |

DOIs | |

State | Published - Aug 6 2023 |

Event | 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023 - Long Beach, United States Duration: Aug 6 2023 → Aug 10 2023 |

### Publication series

Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|

### Conference

Conference | 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023 |
---|---|

Country/Territory | United States |

City | Long Beach |

Period | 8/6/23 → 8/10/23 |

## Keywords

- graph dataset distillation
- graph machine learning

## ASJC Scopus subject areas

- Software
- Information Systems