Seqs-Extractor: Automated sequences extraction to reduce tedious manual corrections of large datasets
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology
- Keywords
- Sequences analysis, Next-generation sequencing, Databases, Bash
- Copyright
- © 2017 Pereira et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Seqs-Extractor: Automated sequences extraction to reduce tedious manual corrections of large datasets. PeerJ Preprints 5:e3364v1 https://doi.org/10.7287/peerj.preprints.3364v1
Abstract
The analysis of large numbers of sequences requires the reduction of ambiguities during the analytical work to ensure that the effort will focus only on confirmed sequences. Performing this work automatically may help to minimize potential errors associated with tedious manual correction, allowing more effective results. Basic local alignment search tool (BLAST) seems to be the most widely used sequence analysis program. It is free, but commercial parties enhanced BLAST applications and charge a fee for their uses. There are some tools of public domain that can perform the search of microsatellites in the next generation sequencing (NGS) data, as the microsatellite identification tool (MISA), which has some features to discover microsatellites in large datasets. Here, we developed a basic shell script (BASH script) to be ran under Linux environment that can be used to extract from a sequence dataset only confirmed (BLASTed) sequences from both nucleotide (BLASTN) and protein (BLASTX) databases and extract sequences that contains microsatellites using MISA tool, using a friendly interface and no fees charged. Seqs-Extractor is a helpful tool that may enhance the analysis of large datasets in BLAST+ and MISA by minimizing the time of management, reducing potential errors caused by manipulating data and no fees charged. Seqs-Extractor is available at https://github.com/patrick-douglas/Seqs-Extractor/wiki .
Author Comment
This is a preprint submission to PeerJ Preprints.
Supplemental Information
High resolution multi-layered Figure 1 - Flowchart
Working steps of the six possible methods of Seqs-Extractor. The steps of each 131 method can be followed according to the color of the fluxes