Abstract
Much of the world's most valued data is stored in relational databases and data warehouses, where the data is organized into tables connected by primary-foreign key relations. However, building machine learning models using this data is both challenging and time consuming because no ML algorithm can directly learn from multiple connected tables. Current approaches can only learn from a single table, so data must first be manually joined and aggregated into this format, the laborious process known as feature engineering. This position paper introduces Relational Deep Learning (RDL), a blueprint for end-to-end learning on relational databases. The key is to represent relational databases as temporal, heterogeneous graphs, with a node for each row in each table, and edges specified by primary-foreign key links. Graph Neural Networks then learn representations that leverage all input data, without any manual feature engineering. We also introduce RELBENCH, and benchmark and testing suite, demonstrating strong initial results. Overall, we define a new research area that generalizes graph machine learning and broadens its applicability.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 13592-13607 |
| Number of pages | 16 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 235 |
| State | Published - 2024 |
| Event | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: Jul 21 2024 → Jul 27 2024 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability