Skip to content

SANGEA: Scalable and Attributed Network Generation

The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling.

However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in O(N^2), with N being the number of nodes in the graph. For this reason, many state-of-the-art SGGs do not apply to large graphs.

In this paper, we present SANGEA, a sizeable synthetic graph generation framework that extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph.

Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.

 

Valentin Lemaire, Youssef Achenchabe, Lucas Ody, Houssem Eddine Souid, Gianmarco Aversano, Nicolas Posocco, Sabri Skhiri, SANGEA: Scalable and Attributed Network GenerationIn Proc. of The 15th Asian Conference on Machine Learning (ACML 2023), November 2023.

 

Click here to access the paper.

Releated Posts

Evaluation of GraphRAG Strategies for Efficient Information Retrieval

Traditional RAG systems struggle to capture relationships and cross-references between different sources unless explicitly mentioned. This challenge is common in real-world scenarios, where information is often distributed and interlinked, making graphs a more effective representation. Our work provides a technical contribution through a comparative evaluation of retrieval strategies within GraphRAG, focusing on context relevance rather than abstract metrics. We aim to offer practitioners actionable insights into the retrieval component of the GraphRAG pipeline.
Read More

Flight Load Factor Predictions based on Analysis of Ticket Prices and other Factors

The ability to forecast traffic and to size the operation accordingly is a determining factor, for airports. However, to realise its full potential, it needs to be considered as part of a holistic approach, closely linked to airport planning and operations. To ensure airport resources are used efficiently, accurate information about passenger numbers and their effects on the operation is essential. Therefore, this study explores machine learning capabilities enabling predictions of aircraft load factors.
Read More