C

Carl Kingsford

Carnegie Mellon University

Publishes on Genomics and Phylogenetic Studies, Genomics and Chromatin Dynamics, Algorithms and Data Compression. 238 papers and 25.5k citations.

238Publications
25.5kTotal Citations
#10in RNA-seq

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

A fast, lock-free approach for efficient parallel counting of occurrences of <i>k</i> -mers
Guillaume Marçais, Carl Kingsford|Bioinformatics|2011
Cited by 5kOpen Access

MOTIVATION: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm. RESULTS: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution. AVAILABILITY: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish.

The power of protein interaction networks for associating genes with diseases
Saket Navlakha, Carl Kingsford|Bioinformatics|2010
Cited by 365Open Access

MOTIVATION: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques. RESULTS: We assessed the utility of physical protein interactions for determining gene-disease associations by examining the performance of seven recently developed computational methods (plus several of their variants). We found that random-walk approaches individually outperform clustering and neighborhood approaches, although most methods make predictions not made by any other method. We show how combining these methods into a consensus method yields Pareto optimal performance. We also quantified how a diffuse topological distribution of disease-related proteins negatively affects prediction quality and are thus able to identify diseases especially amenable to network-based predictions and others for which additional information sources are absolutely required. AVAILABILITY: The predictions made by each algorithm considered are available online at http://www.cbcb.umd.edu/DiseaseNet.

Similar Researchers

Coming soon — researchers in similar fields and career stages