Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data

Binsheng He; Rongrong Zhu; Huandong Yang; Qingqing Lu; Weiwei Wang; Lei Song; Xue Sun; Guandong Zhang; Shijun Li; Jialiang Yang; Geng Tian; Pingping Bing; Jidong Lang

doi:10.3389/fbioe.2020.00817

Assessing the Impact of Data Preprocessing on Analyzing Next Generation Sequencing Data

Binsheng He(Changsha Medical University), Rongrong Zhu(Tsinghua University), Huandong Yang(Yidu Central Hospital of Weifang), Qingqing Lu(Cipher Gene (China)), Weiwei Wang(Cipher Gene (China)), Lei Song(Cipher Gene (China)), Xue Sun(Cipher Gene (China)), Guandong Zhang(Cipher Gene (China)), Shijun Li(Chifeng Municipal Hospital), Jialiang Yang(Changsha Medical University), Geng Tian(Cipher Gene (China)), Pingping Bing(Changsha Medical University), Jidong Lang(Cipher Gene (China))

Frontiers in Bioengineering and Biotechnology

July 30, 2020

10.3389/fbioe.2020.00817

Cited by 94Open Access

Full Text

Abstract

Data quality control and preprocessing are often the first step in processing next-generation sequencing (NGS) data of tumors. Not only can it help us evaluate the quality of sequencing data, but it can also help us obtain high-quality data for downstream data analysis. However, by comparing data analysis results of preprocessing with Cutadapt, FastP, Trimmomatic, and raw sequencing data, we found that the frequency of mutation detection had some fluctuations and differences, and human leukocyte antigen (HLA) typing directly resulted in erroneous results. We think that our research had demonstrated the impact of data preprocessing steps on downstream data analysis results. We hope that it can promote the development or optimization of better data preprocessing methods, so that downstream information analysis can be more accurate.

Related Papers

No related papers found

Powered by citation graph analysis