C

Cosma Rohilla Shalizi

Santa Fe Institute

ORCID: 0000-0002-9195-1308

Publishes on Complex Network Analysis Techniques, Complex Systems and Time Series Analysis, Statistical Methods and Inference. 131 papers and 11.1k citations.

131Publications
11.1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Power-law distributions in empirical data
Cited by 6.8kOpen Access

Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twenty-four real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.

Homophily and Contagion Are Generically Confounded in Observational Social Network Studies
Cosma Rohilla Shalizi, Andrew C. Thomas|Sociological Methods & Research|2011
Cited by 1kOpen Access

The authors consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on his or her behavior or other measurable responses. The authors show that generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular the authors demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and his or her choices, even when there is no intrinsic affinity between them. The authors also suggest some possible constructive responses to these results.

Causal architecture, complexity and self-organization in time series and cellular automata
Cited by 231

All self-respecting nonlinear scientists know self-organization when they see it: except when we disagree. For this reason, if no other, it is important to put some mathematical spine into our floppy intuitive notion of self-organization. Only a few measures of self-organization have been proposed; none can be adopted in good intellectual conscience. To find a decent formalization of self-organization, we need to pin down what we mean by organization. The best answer is that the organization of a process is its causal architecture—its internal, possibly hidden, causal states and their interconnections. Computational mechanics is a method for inferring causal architecture—represented by a mathematical object called the e-machine—from observed behavior. The e-machine captures all patterns in the process which have any predictive power, so computational mechanics is also a method for pattern discovery. In this work, I develop computational mechanics for four increasingly sophisticated types of process—memoryless transducers, time series, transducers with memory, and cellular automata. In each case I prove the optimality and uniqueness of the e-machine's representation of the causal architecture, and give reliable algorithms for pattern discovery. The e-machine is the organization of the process, or at least of the part of it which is relevant to our measurements. It leads to a natural measure of the statistical complexity of processes, namely the amount of information needed to specify the state of the E-machine. Self-organization is a self-generated increase in statistical complexity. This fulfills various hunches which have been advanced in the literature, seems to accord with people's intuitions, and is both mathematically precise and operational.