A Blue Start: A large-scale pairwise and higher-order social network dataset
Published in Nature Scientific Data, 2026
Coauthors: Ilya Amburg, Sagar Kumar, Nicholas Landry, Brooke Foucualt Welles
Large-scale networks have been instrumental in shaping how we think about social systems, and have undergirded many foundational results in mathematical epidemiology, computational social science, and biology. However, many of the social systems through which diseases spread, information disseminates, and individuals interact are inherently mediated through groups, known as higher-order interactions. A gap exists between higher-order models of group formation and spreading processes and the data necessary to validate these mechanisms. Similarly, few datasets bridge the gap between pairwise and higher-order network data. The Bluesky social media platform is an ideal laboratory for observing social ties at scale through its open API. Not only does Bluesky contain pairwise following relationships, but it also contains higher-order social ties known as “starter packs” which are user-curated lists designed to promote social network growth. We introduce “A Blue Start”, a large-scale network dataset comprising 39.7M user accounts, 2.4B pairwise following relationships, and 365.8K groups representing starter packs. This dataset will be an essential resource for the study of higher-order networks.
The dataset can be found here.
The paper was accepted at Nature Scientific Data in February 2026.