Jason S. Papadopoulos

Christiam Camacho, George Coulouris, Vahram Avagyan et al.|BMC Bioinformatics|2009

Cited by 23kOpen Access

BACKGROUND: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS: We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

COBALT: constraint-based alignment tool for multiple protein sequences

Jason S. Papadopoulos, Richa Agarwala|Bioinformatics|2007

Cited by 1.2kOpen Access

MOTIVATION: A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools. RESULTS: We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALT's alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems. AVAILABILITY: COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt

Improved BLAST searches using longer words for protein seeding

Sergey Shiryev, Jason S. Papadopoulos, Alejandro A. Schäffer et al.|Bioinformatics|2007

Cited by 126Open Access

MOTIVATION: The blastp and tblastn modules of BLAST are widely used methods for searching protein queries against protein and nucleotide databases, respectively. One heuristic used in BLAST is to consider only database sequences that contain a high-scoring match of length at most 5 to the query. We implemented the capability to use words of length 6 or 7. We demonstrate an improved trade-off between running time and retrieval accuracy, controlled by the score threshold used for short word matches. For example, the running time can be reduced by 20-30% while achieving ROC (receiver operator characteristic) scores similar to those obtained with current default parameters. AVAILABILITY: The option to use long words is in the NCBI C and C++ toolkit code for BLAST, starting with version 2.2.16 of blastall. A Linux executable used to produce the results herein is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/protein_longwords

The twenty-fourth Fermat number is composite

Richard E. Crandall, Ernst W. Mayer, Jason S. Papadopoulos|Mathematics of Computation|2002

Cited by 28Open Access

We have shown by machine proof that <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper F 24 equals 2 Superscript 2 Super Superscript 24 Baseline plus 1"> <mml:semantics> <mml:mrow> <mml:msub> <mml:mi>F</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>24</mml:mn> </mml:mrow> </mml:msub> <mml:mo>=</mml:mo> <mml:msup> <mml:mn>2</mml:mn> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:msup> <mml:mn>2</mml:mn> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>24</mml:mn> </mml:mrow> </mml:msup> </mml:mrow> </mml:msup> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> <mml:annotation encoding="application/x-tex">F_{24} = 2^{2^{24}} + 1</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is composite. The rigorous Pépin primality test was performed using independently developed programs running simultaneously on two different, physically separated processors. Each program employed a floating-point, FFT-based discrete weighted transform (DWT) to effect multiplication modulo <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper F 24"> <mml:semantics> <mml:msub> <mml:mi>F</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>24</mml:mn> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">F_{24}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> . The final, respective Pépin residues obtained by these two machines were in complete agreement. Using intermediate residues stored periodically during one of the floating-point runs, a separate algorithm for pure-integer negacyclic convolution verified the result in a “wavefront” paradigm, by running simultaneously on numerous additional machines, to effect piecewise verification of a saturating set of deterministic links for the Pépin chain. We deposited a final Pépin residue for possible use by future investigators in the event that a proper factor of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper F 24"> <mml:semantics> <mml:msub> <mml:mi>F</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>24</mml:mn> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">F_{24}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> should be discovered; herein we report the more compact, traditional Selfridge-Hurwitz residues. For the sake of completeness, we also generated a Pépin residue for <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper F 23"> <mml:semantics> <mml:msub> <mml:mi>F</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>23</mml:mn> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">F_{23}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> , and via the Suyama test determined that the known cofactor of this number is composite.

Jason S. Papadopoulos

Is this you? Claim your profile.

Top publicationsby citations