Optimizing chloroplast genome assembly and annotation with skim sequencing data

Created: 04-03-2021 Forks: 2 Watchers: 1 Stars: 1

Description

Chloroplast genes and genomes are the most important genomic data for plant phylogeny and species identification. Skim sequencing is obtaining low coverage genome sequencing data that includes, nuclear, chloroplast and mitochondria genome sequences. Since the rapid development of high throughput sequencing technologies, it is cheap to get the low coverage data of whole genome (usually about 20-30GB data), which is enough to assemble a complete chloroplast genome. To date, there are many assembly processes/pipelines described to assemble a complete chloroplast genome. However, how much data is needed or actually used in such analysis is not clear. Having such information will help biologists to design their experiments properly and cost-effectively. Biologists expect a simple, fast and user-friendly procedure to assemble and annotate a circular chloroplast genome using Illumina NGS data. In this project, we will research the existing procedures for chloroplast genome assembly and annotation, and work on developing the methods to identify and select the optimal set(s) of data and the procedure(s) to assemble a given chloroplast genome as accurately and efficiently as possible, by using computational, statistical & heuristical methods.