Koustav Pal

Comparison of computational methods for Hi-C data analysis

Mattia Forcato, Chiara Nicoletti, Koustav Pal et al.|Nature Methods|2017

Cited by 356Open Access

lncRNome: a comprehensive knowledgebase of human long noncoding RNAs

Deeksha Bhartiya, Koustav Pal, Sourav Ghosh et al.|Database|2013

Cited by 148Open Access

The advent of high-throughput genome scale technologies has enabled us to unravel a large amount of the previously unknown transcriptionally active regions of the genome. Recent genome-wide studies have provided annotations of a large repertoire of various classes of noncoding transcripts. Long noncoding RNAs (lncRNAs) form a major proportion of these novel annotated noncoding transcripts, and presently known to be involved in a number of functionally distinct biological processes. Over 18,000 transcripts are presently annotated as lncRNA, and encompass previously annotated classes of noncoding transcripts including large intergenic noncoding RNA, antisense RNA and processed pseudogenes. There is a significant gap in the resources providing a stable annotation, cross-referencing and biologically relevant information. lncRNome has been envisioned with the aim of filling this gap by integrating annotations on a wide variety of biologically significant information into a comprehensive knowledgebase. To the best of our knowledge, lncRNome is one of the largest and most comprehensive resources for lncRNAs. Database URL: http://genome.igib.res.in/lncRNome.

Hi-C analysis: from data generation to integration

Koustav Pal, Mattia Forcato, Francesco Ferrari|Biophysical Reviews|2018

Cited by 111Open Access

In the epigenetics field, large-scale functional genomics datasets of ever-increasing size and complexity have been produced using experimental techniques based on high-throughput sequencing. In particular, the study of the 3D organization of chromatin has raised increasing interest, thanks to the development of advanced experimental techniques. In this context, Hi-C has been widely adopted as a high-throughput method to measure pairwise contacts between virtually any pair of genomic loci, thus yielding unprecedented challenges for analyzing and handling the resulting complex datasets. In this review, we focus on the increasing complexity of available Hi-C datasets, which parallels the adoption of novel protocol variants. We also review the complexity of the multiple data analysis steps required to preprocess Hi-C sequencing reads and extract biologically meaningful information. Finally, we discuss solutions for handling and visualizing such large genomics datasets.

Global chromatin conformation differences in the Drosophila dosage compensated chromosome X

Koustav Pal, Mattia Forcato, Daniel Jost et al.|Nature Communications|2019

Cited by 39Open Access

In Drosophila melanogaster the single male chromosome X undergoes an average twofold transcriptional upregulation for balancing the transcriptional output between sexes. Previous literature hypothesised that a global change in chromosome structure may accompany this process. However, recent studies based on Hi-C failed to detect these differences. Here we show that global conformational differences are specifically present in the male chromosome X and detectable using Hi-C data on sex-sorted embryos, as well as male and female cell lines, by leveraging custom data analysis solutions. We find the male chromosome X has more mid-/long-range interactions. We also identify differences at structural domain boundaries containing BEAF-32 in conjunction with CP190 or Chromator. Weakening of these domain boundaries in male chromosome X co-localizes with the binding of the dosage compensation complex and its co-factor CLAMP, reported to enhance chromatin accessibility. Together, our data strongly indicate that chromosome X dosage compensation affects global chromosome structure.

Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer–target gene regulatory interactions

Elisa Salviato, Vera Djordjilović, Judith Mary Hariprakash et al.|Nucleic Acids Research|2021

Cited by 13Open Access

Abstract A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer–target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.

Is this you? Claim your profile.

Top publicationsby citations