U.S. flag

An official website of the United States government, Department of Justice.

ProSynAR: a reference aware read merger

NCJ Number
304567
Journal
Bioinformatics Volume: 38 Issue: 7 Dated: April 2022 Pages: 2052-2053
Date Published
April 2022
Length
2 pages
Annotation

Since read-merging algorithms that look solely at the reads can misalign and mis-merge the reads (especially near repetitive sequences), the C++ program ProSynAR has been written to take the reads’ position in the reference into account when performing (and deciding whether to perform) a merge.

Abstract

Paired-end sequencing produces two views of a DNA molecule. If those two reads actually overlap (such as with insert sizes shorter than twice the read length), there are many situations (e.g. low-template variant calling) where combining them into a single estimate is helpful (either through increased length or reduced error). However, care is required to ensure this merging is done correctly. When merging reads, there are two main questions: first, which bases of the reads correspond and, second, how should those bases be merged? There are presently several methods for merging paired-end reads; however, all of the methods and software in common use [FLASH (Magoč and Salzberg, 2011), PEAR (Zhang et al., 2014), PANDASeq (Masella et al., 2012), NGmerge (Gaspar, 2018), among others] answer these questions solely by looking at the two reads under consideration. While that approach is faster than what is proposed here, there exist situations where the correspondence apparent from the reads themselves is misleading and can result in improper merges, potentially resulting in incorrect allele calls (in particular, size changes) and downstream bias.

Nix users can retrieve the source from GitHub (https://github.com/Benjamin-Crysup/prosynar). Windows binary available at https://github.com/Benjamin-Crysup/prosynar/releases/download/1.0/prosynar.zip. (Publisher Abstract)

Date Published: April 1, 2022