[maker-devel] which files are expected after fasta_merge?

Brandon Pickett pickettbd at gmail.com
Thu Aug 15 14:48:46 MDT 2019


Good afternoon!

I just finished my third round of maker. I trained snap, augustus, etc.
between the rounds. I used fasta_merge and gff3_merge to extract files
after each round of maker. gff3_merge performed as expected each time, but
fasta_merge surprised me. I will show you which files fasta_merge generated
after each round. Please note that, as many people do, I renamed my output
files from the default. Accordingly, I will list all the files with a
generalized prefix of "maker" and show the rest of the file name as it was
generated for me. Also note that I've changed .fasta to .fa for brevity.

After round #1:
transcripts.fa
proteins.fa

After round #2:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
evm.proteins.fa
evm.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa
proteins.fa

After round #3:
non_overlapping_ab_initio.proteins.fa
non_overlapping_ab_initio.transcripts.fa
augustus_masked.proteins.fa
augustus_masked.transcripts.fa
genemark.proteins.fa
genemark.transcripts.fa
snap_masked.proteins.fa
snap_masked.transcripts.fa

I am unsurprised that I didn't get all these files after round #1 because I
used round #1 to generate gene models from transcript evidence. I didn't
expect so many files after round #2 (having only seen the output from round
#1 up to that point), but it makes sense that I would get output from
augustus, evidence modeler (evm), genemark, and snap since I provided them
as input to this round (#2) of maker. Between rounds #2 and #3, I
re-trained snap and augustus. Genemark was trained between rounds #1 and #2
without gene models from maker and thus did not require re-training. The
only difference in my maker control files between rounds #2 and #3 were the
paths to the snap and augustus files. In both #2 and #3, the control files
had run_evm=1. I can provide my control files for each round, if needed. *My
question is why transcripts.fa, proteins.fa, evm.proteins.fa, and
evm.transcripts.fa were not generated after round #3? *I recognize that
this is probably not an error, rather a lack of my understanding of when
each file is and is not generated.

Thank you,
Brandon Pickett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20190815/46a460f2/attachment-0002.html>


More information about the maker-devel mailing list