Reproduction of genome correction software
Abstract
The primary issues with genome sequencing machines today are insertion,
substitution, and deletion. These have led to the rise of genome correction software which use
different algorithms to correct errors in the sequence. The purpose of this study is to test
around 12 of the most popular genome correcting software and see how the results that we
obtain compare to the results that are reported. We use Nextflow as the pipeline software and
Docker containers so that the environment remains constant and can be replicated by anyone
after us to see our results. Each testing case started off with a Docker container where we preinstall the correction software along with indexing software. Then we move on to the Nextflow
template that consists of the datasets that we will be testing. The next section is the primary
indexing followed by running the actual correction software on the dataset. Lastly, we have to
do another round of indexing and then final measure results by running a script which tells us
how many well the software ran. The testing programs are usually custom python scripts that
output in the format provided in the correction software’s paper. We have published a website
which feature all the results that we have found. Within the website, results are divided up by
software. Within each software, one can see the results we found next to the results that are
published and the discrepancy between the two.