The human body at cellular resolution: the NIH Human Biomolecular Atlas ProgramTransformative technologies are enabling the construction of three-dimensional maps of tissues with unprecedented spatial and molecular resolution. Over the next seven years, the NIH Common Fund Human Biomolecular Atlas Program (HuBMAP) intends to develop a widely accessible framework for comprehensively mapping the human body at single-cell resolution by supporting technology development, data acquisition, and detailed spatial mapping. HuBMAP will integrate its efforts with other funding agencies, programs, consortia, and the biomedical research community at large towards the shared vision of a comprehensive, accessible three-dimensional molecular and cellular atlas of the human body, in health and under various disease conditions.
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precisionUNLABELLED: Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. AVAILABILITY AND IMPLEMENTATION: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/.
Automatic resource and service management for ubiquitous computing environmentsThe high degree of dynamism and heterogeneity of the resources involved in a pervasive computing environment makes service adaptation and interoperability a difficult task. We present UBIDEV, a service framework that faces the heterogeneity problem by hiding at the application level the dynamism of the underlying environment. We describe the UBIDEV architecture focusing on the description and the management of services and resources. We also describe how this approach decreases the complexity of the design and development of service-oriented applications. A prototype implementation of a unified messaging system is presented as a validation of the architectural design.
GridCertLib: A Single Sign-on Solution for Grid Web Applications and PortalsThis paper describes the design and implementation of GridCertLib, a Java library leveraging a Shibboleth-based authentication infrastructure and the SLCS online certificate signing service, to provide short-lived X.509 certificates and Grid proxies. The main use case envisioned for GridCertLib, is to provide seamless and secure access to Grid X.509 certificates and proxies in web applications and portals: when a user logs in to the portal using SAML-based Shibboleth authentication, GridCertLib uses the SAML assertion to obtain a Grid X.509 certificate from the SLCS service and generate a VOMS proxy from it. We give an overview of the architecture of GridCertLib and briefly describe its programming model. Its application to some deployment scenarios is outlined, as well as a report on practical experience integrating GridCertLib into portals for Bioinformatics and Computational Chemistry applications, based on the popular P-GRADE and Django softwares
SwissPIT: An workflow‐based platform for analyzing tandem‐MS spectra using the GridThe identification and characterization of peptides from MS/MS data represents a critical aspect of proteomics. It has been the subject of extensive research in bioinformatics resulting in the generation of a fair number of identification software tools. Most often, only one program with a specific and unvarying set of parameters is selected for identifying proteins. Hence, a significant proportion of the experimental spectra do not match the peptide sequences in the screened database due to inappropriate parameters or scoring schemes. The Swiss protein identification toolbox (swissPIT) project provides the scientific community with an expandable multitool platform for automated in-depth analysis of MS data also able to handle data from high-throughput experiments. With swissPIT many problems have been solved: The missing standards for input and output formats (A), creation of analysis workflows (B), unified result visualization (C), and simplicity of the user interface (D). Currently, swissPIT supports four different programs implementing two different search strategies to identify MS/MS spectra. Conceived to handle the calculation-intensive needs of each of the programs, swissPIT uses the distributed resources of a Swiss-wide computer Grid (http://www.swing-grid.ch).