U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

A Divide-and-Conquer Algorithm for Large-scale de Novo Transcriptome Assembly Through Combining Small Assemblies From Existing Algorithms

NCJ Number
253425
Journal
BMC Genomics Volume: 18 Dated: 2018 Pages: 43-50
Author(s)
Sing-Hoi Sze; Jonathan J. Parrott; Aarib M. Tarone
Date Published
2018
Length
8 pages
Annotation
This project developed a divide-and-conquer strategy that enables algorithms that can assemble a large amount of RNA-Seq data to be utilized by subdividing a large RNA-Seq data set into small libraries.
Abstract

Although the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. In the proposed data set with small libraries, each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high-quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. This divide-and-conquer strategy enables memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies. (publisher abstract modified)