Network comparison and the within-ensemble graph distanceHarrison Hartle, Brennan Klein, Stefan McCabe et al.|Proceedings of the Royal Society A Mathematical Physical and Engineering Sciences|2020 Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years, a multitude of diverse, ad hoc solutions to this problem have been introduced. Here, we propose that simple and well-understood ensembles of random networks—such as Erdős–Rényi graphs, random geometric graphs, Watts–Strogatz graphs, the configuration model and preferential attachment networks—are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.
Online engagement with 2020 election misinformation and turnout in the 2021 Georgia runoff electionJon Green, William Hobbs, Stefan McCabe et al.|Proceedings of the National Academy of Sciences|2022 Following the 2020 general election, Republican elected officials, including then-President Donald Trump, promoted conspiracy theories claiming that Joe Biden's close victory in Georgia was fraudulent. Such conspiratorial claims could implicate participation in the Georgia Senate runoff election in different ways-signaling that voting doesn't matter, distracting from ongoing campaigns, stoking political anger at out-partisans, or providing rationalizations for (lack of) enthusiasm for voting during a transfer of power. Here, we evaluate the possibility of any on-average relationship with turnout by combining behavioral measures of engagement with election conspiracies online and administrative data on voter turnout for 40,000 Twitter users registered to vote in Georgia. We find small, limited associations. Liking or sharing messages opposed to conspiracy theories was associated with higher turnout than expected in the runoff election, and those who liked or shared tweets promoting fraud-related conspiracy theories were slightly less likely to vote.
Using Administrative Records and Survey Data to Construct Samples of Tweeters and TweetsAdam G. Hughes, Stefan McCabe, William Hobbs et al.|Public Opinion Quarterly|2021 Abstract Social media data can provide new insights into political phenomena, but users do not always represent people, posts and accounts are not typically linked to demographic variables for use as statistical controls or in subgroup comparisons, and activities on social media can be difficult to interpret. For data scientists, adding demographic variables and comparisons to closed-ended survey responses have the potential to improve interpretations of inferences drawn from social media—for example, through comparisons of online expressions and survey responses, and by assessing associations with offline outcomes like voting. For survey methodologists, adding social media data to surveys allows for rich behavioral measurements, including comparisons of public expressions with attitudes elicited in a structured survey. Here, we evaluate two popular forms of linkages—administrative and survey—focusing on two questions: How does the method of creating a sample of Twitter users affect its behavioral and demographic profile? What are the relative advantages of each of these methods? Our analyses illustrate where and to what extent the sample based on administrative data diverges in demographic and partisan composition from surveyed Twitter users who report being registered to vote. Despite demographic differences, each linkage method results in behaviorally similar samples, especially in activity levels; however, conventionally sized surveys are likely to lack the statistical power to study subgroups and heterogeneity (e.g., comparing conversations of Democrats and Republicans) within even highly salient political topics. We conclude by developing general recommendations for researchers looking to study social media by linking accounts with external benchmark data sources.
netrd: A library for network reconstruction and graph distancesStefan McCabe, Leo Torres, Timothy LaRock et al.|The Journal of Open Source Software|2021 . This field is built around the idea that an increased understanding of the complex structural properties of a variety systems will allow us to better observe, predict, and even control the behavior of these systems.
Pandemics, Protests, and PublicsSarah Shugars, Adina Gitomer, Stefan McCabe et al.|Journal of Quantitative Description Digital Media|2021 As an integral component of public discourse, Twitter is among the main data sources for scholarship in this area. However, there is much that scholars do not know about the basic mechanisms of public discourse on Twitter, including the prevalence of various modes of communication, the types of posts users make, the engagement those posts receive, or how these things vary with user demographics and across different topical events. This paper broadens our understanding of these aspects of public discourse. We focus on the first nine months of 2020, studying that period as a whole and giving particular attention to two monumentally important topics of that time: the Black Lives Matter movement and the COVID-19 pandemic. Leveraging a panel of 1.6 million Twitter accounts matched to U.S. voting records, we examine the demographics, activity, and engagement of 800,000 American adults who collectively posted nearly 300 million tweets during this time span. We find notable variation in user activity and engagement, in terms of modality (e.g., retweets vs. replies), demographic subgroup, and topical context. We further find that while Twitter can best be understood as a collection of interconnected publics, neither topical nor demographic variation perfectly encapsulates the "Twitter public." Rather, Twitter publics are fluid, contextual communities which form around salient topics and are informed by demographic identities. Together, this paper presents a disaggregated, multifaceted description of the demographics, activity, and engagement of American Twitter users in 2020.