Sören Sonnenburg

Sören Sonnenburg, Gunnar Rätsch, Christin Schäfer et al.|MPG.PuRe (Max Planck Society)|2006

Cited by 1.2kOpen Access

All in-text\treferences\tunderlined\tin\tblue\tare\tlinked\tto\tpublications\ton\tResearchGate, letting you\taccess\tand\tread\tthem\timmediately.

Support Vector Machines and Kernels for Computational Biology

Asa Ben‐Hur, Cheng Soon Ong, Sören Sonnenburg et al.|PLoS Computational Biology|2008

Cited by 717Open Access

The increasing wealth of biological data coming from a large variety of platforms and the continued development of new high-throughput methods for probing biological systems require increasingly more sophisticated computational approaches. Putting all these data in simple-to-use databases is a first step; but realizing the full potential of the data requires algorithms that automatically extract regularities from the data, which can then lead to biological insight. Many of the problems in computational biology are in the form of prediction: starting from prediction of a gene's structure, prediction of its function, interactions, and role in disease. Support vector machines (SVMs) and related kernel methods are extremely good at solving such problems [1]–[3]. SVMs are widely used in computational biology due to their high accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data [2], [4]–[6]. The simplest form of a prediction problem is binary classification: trying to discriminate between objects that belong to one of two categories—positive (+1) or negative (−1). SVMs use two key concepts to solve this problem: large margin separation and kernel functions. The idea of large margin separation can be motivated by classification of points in two dimensions (see Figure 1). A simple way to classify the points is to draw a straight line and call points lying on one side positive and on the other side negative. If the two sets are well separated, one would intuitively draw the separating line such that it is as far as possible away from the points in both sets (see Figures 2 and 3). This intuitive choice captures the idea of large margin separation, which is mathematically formulated in the section Classification with Large Margin.

Efficient and Accurate Lp-Norm Multiple Kernel Learning

Marius Kloft, Ulf Brefeld, Pavel Laskov et al.|Unknown|2009

Cited by 220

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, l1-norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary lp-norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply lp-norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art.

Accurate splice site prediction using support vector machines

Sören Sonnenburg, Gabriele Schweikert, Petra Philips et al.|BMC Bioinformatics|2007

Cited by 188Open Access

BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. AVAILABILITY: Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.

The Feature Importance Ranking Measure

Alexander Zien, Nicole Krämer, Sören Sonnenburg et al.|Lecture notes in computer science|2009

Cited by 170

Sören Sonnenburg

Is this you? Claim your profile.

Top publicationsby citations