[maker-devel] scaffolds missing from master_datastore_index.log and all.gff files
Valerie Soza
vsoza at uw.edu
Thu Mar 15 12:18:46 MDT 2018
Thanks, Carson, that worked. I am now getting all scaffolds in the assembly indicated as finished in the master_datastore_index.log and when I redo gff3_merge with this updated log, my all.gff file is complete too. Glad there is a quick fix for this in MAKER. Thank you, MAKER developers.
-Valerie
> On Mar 15, 2018, at 8:26 AM, Carson Holt <carsonhh at gmail.com> wrote:
>
> If running multiple jobs of MAEKR at the same time, you can hot a race condition where once MAKER run keeps another from making the correct entry into the datastore log.
>
> You can delete the log and then run ‘maker -dsindex’ to rebuild it which a single maker process (takes less than 5 minutes ).
>
> —Carson
>
>
>
>> On Mar 14, 2018, at 6:21 PM, Valerie Soza <vsoza at uw.edu> wrote:
>>
>> Hi MAKER community
>>
>> I have done several rounds of training and annotations on an assembly that consists of 12027 scaffolds using the MAKER2 pipeline. I am running multiple instances of MAKER to speed up the process. I have noticed that the number of contigs (aka scaffolds) differs among the different rounds of annotations I have done in MAKER, ranging from 12024 to 12027 scaffolds. These counts were obtained with the SOBAcl tool to count the number of contigs from each all.gff file generated by the gff3_merge script included in MAKER.
>>
>> In my latest round of annotations within MAKER, I have only obtained 12026 scaffolds using the SOBAcl tool on the all.gff file, indicating that I am missing 1 scaffold, even though there was no indication of any scaffolds as FAILED, RETRY, or SKIPPED.
>>
>> To figure out what might be going on, I searched for STARTED and FINISHED scaffolds in the master_datastore_index.log and found that I had a different number of started vs. finished scaffolds, and none of these were equal to the total in the assembly of 12027.
>>
>> $ grep STARTED Rwill7_master_datastore_index.log | sort | uniq | cut -f 1 | wc
>> 12024 12024 313247
>>
>> 3 started scaffolds missing from this file are LG01_ordered_scaffold_2, LG01_ordered_scaffold_3, and LG07_unordered_scaffold_86.
>>
>> $ grep FINISHED Rwill7_master_datastore_index.log | sort | uniq | cut -f 1 | wc
>> 12026 12026 313295
>>
>> 1 finished scaffold missing from this file is LG08_unordered_scaffold_90.
>>
>> I then searched for these scaffolds in the all.gff file and found that the 3 missing started scaffolds were present, but the one missing finished scaffold (LG08_unordered_scaffold_90) was not. This scaffold (LG08_unordered_scaffold_90) should be in the gff3 as it had some repeat masking done on it as indicated by the query.masked.fasta file for this scaffold in its theVoid directory.
>>
>> After looking at the gff3_merge and fasta_merge scripts, it seems that only finished scaffolds are used to generate gff3 and fasta files so this explains why I am missing one scaffold (LG08_unordered_scaffold_90) for a total of only 12026 scaffolds in the all.gff file.
>>
>> I am concerned that because the started and finished scaffolds are different in the master_datastore_index.log, that not all scaffolds are being output to the gff3 and fasta files generated by the MAKER scripts.
>>
>> Any insights as to why I am getting a different numbers of scaffolds indicated as started versus finished? and as to why all but 1 scaffold finished?
>>
>> Thanks.
>>
>> -Valerie
>>
>> Valerie Soza, Ph.D.
>> c/o Hall Lab
>> Department of Biology
>> University of Washington
>> Johnson Hall 202A
>> Box 351800
>> Seattle, WA 98195-1800
>> 206-543-6740
>> http://staff.washington.edu/vsoza/
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
Valerie Soza, Ph.D.
c/o Hall Lab
Department of Biology
University of Washington
Johnson Hall 202A
Box 351800
Seattle, WA 98195-1800
206-543-6740
http://staff.washington.edu/vsoza/
More information about the maker-devel
mailing list