[maker-devel] Unwarranted error: Skipping the contig because it is too short

lahcen campbell lahcencampbell at gmail.com
Wed Nov 15 09:56:20 MST 2017


Just an add on to this topic.... I have found a suite of gff utilities here
which I hope can help me quickly parse the MAKER gff.

https://github.com/mamarjan/gff3-pltools

I'll report back how it goes !

Best
L

On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <
michael.s.campbell1 at gmail.com> wrote:

> Hi Lancen,
>
> Thanks, the name has served me well for a number of years now :)
>
> So I started a run with your 11 scaffolds. I gave it the protein file that
> you sent and used all of repbase for masking. All of the scaffolds finished
> without error. I was hoping it would be something simple that just needed
> another set of eyes to see, looks like it's not the case for this one.
>
> To further rule out a data issue I would try running it with the dpp test
> data that is bundled with MAKER to see if you can get the same error. This
> data set will run in about a minute. If you are on a cluster I would try
> running it with and without submitting it you the nodes and with and
> without mpi.
>
> One thing that I have done in the past is to make a new directory and run
> maker there (this doesn't make a lot of sense but when the error doesn't
> make sense either it seems reasonable).
>
> As far as rerunning MAKER there are a couple of approaches. If you want it
> to stop complaining about trying to  many times on failed contigs you can
> increase the number of tries in the opts file. The line looks like this:
>
> tries=2 #number of times to try a contig if there is a failure for some
> reason
>
> If you want to run it elsewhere, but you don't want to have to redo all of
> the repeat masking and blasting you can use the gff3 output from an earlier
> run. If you used gff3_merge after the first run finished you got a big gff3
> file with all of the gene models and evidence. If you break up that file by
> the source column you can selectively pass the evidence back to MAKER. If
> you put all of the repeatmasker and repeatrunner entries into one file and
> pass it in on this line:
>
> rm_gff= #pre-identified repeat elements from an external GFF3 file
>
> you can turn off model_org= and repeat_protein=. This will speed up the
> next run a lot. Then you can pass in the protein2genome gff3 data on this
> line:
>
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
>
> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is
> assumes that it is polished and will not make any effort to fix alignments.
> the protein2genome data is polished. est2genome is the equivalent for EST
> input.
>
> Clean_up is useful if you are running on a file system that limits the
> number of files that you can write. It removes all of the intermediate
> files used in the annotation. This takes away the advantage of rerunning in
> the same directory. clean_try deletes everything first, and starts again.
> clean_try is the one that deletes everything and pretends that the first
> run never happened.
>
> I ccd the list on this response just Incas anyone else has any ideas or is
> facing the same error.
>
> Let me know if any of this helps,
> Mike
>
> On Nov 14, 2017, at 10:48 AM, lahcen campbell <lahcencampbell at gmail.com>
> wrote:
>
> Hi Michael
>
> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell
> to be exact haha...anyway... thanks for the reply and offer to help.
>
> I have attached the file in question below. Its so strange, I had to just
> leave it alone cause it was making me quite frustrated. Those bugs which
> there are now common sense solutions are the worst cause very easily you
> reach a wall.
>
> Might it have anything at all to do with the Protein homology file I
> passed in ? Though, note.... the same protein files here have been used in
> another maker run without issue so I kind of ruled that out already.....but
> just spitballing at this stage.
>
>
> Might I be so cheeky to ask you one more MAKER related question Michael...
> ? Feel free to ignore it I hate to push but im desperate to figure it out
> with little time to do so...
>
> I have an issue with a different MAKER analysis. Currently any new run I
> attempt on this datastore, which has one round successful with 25000 odd
> genes and double the transcripts. I attempted to run the second round with
> a SNAP trained hmm (first time passing in SNAP hmm following first round
> EST/Protein evidence). In this attempt, because we obtained so many genes I
> thought I would be more stringent by changing the AED to 0.7 from 1.0.
> Something I see now I didn't approach in the right way... too late now
> sadly.
>
> MAKER finishes fine, but now it views all previous scaffolds as FAILED.
> Nothing seems to change this and now the datastore is for all intents and
> purposes locked in failed state. It keeps mentioning changes to the opts
> file which there were, and that the previous runs didn't finish so it must
> delete them. The results obtained from round 1 are still there though Im
> pretty sure of that, all blast files etc are still there and populated.
>
> Can you tell me the main differences either clean_up or clean_try have and
> which will completely and irreversibly wipe the first run? Something I
> don't want to repeat, just allow me to progress to the next round. Im
> hesitant to run them, but I've backed up the datastore incase. My next
> attempt will be to pass the exact same maker_opts file from the round1 run,
> with the only change made to clean_try/clean_up....Is this approach
> misguided ?
>
> Your help is very much appreciated Michael so thank you,
> Best
> L
>
>>  Combined_Protein_homology.fa.zip
> <https://drive.google.com/file/d/19ooxfIUGygyW9GBY8uBwCYwjAywRWiL_/view?usp=drive_web>
> ​​
>  SubsampledGenomeFile_n10_11MB.fasta
> <https://drive.google.com/file/d/1Mwj6Jpf1U9xzQVgxVFqeYyQokFIrDFo5/view?usp=drive_web>
>>
>
>
> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <
> michael.s.campbell1 at gmail.com> wrote:
>
>> Hi Lahcen,
>>
>> Nothing comes right to mind for what could be causing this error. If you
>> want to compress your FASTA and send it to me I can try and recreate the
>> error and try and debug it.
>>
>> Thanks,
>> Mike
>>
>> On Nov 14, 2017, at 7:15 AM, lahcen campbell <lahcencampbell at gmail.com>
>> wrote:
>>
>> Hi MAKER community,
>>
>> I was hoping someone could help me. I have a very unusual error with two
>> different versions of maker I have tested so far. This error shouldn't be
>> happening but it occurs time and again no matter what I try. I have tried
>> using 2.31.6_mpich3_icc and 2.31_mpich3
>>
>> Note that version 2.31.6_mpich3_icc is one I have used countless times
>> and produced final MAKER annotations without issue. So its not that this
>> version has issues to date.
>>
>> Basically, this is a brand new MAKER analysis, I am only trying to train
>> SNAP in this first round. I am following the MakerTutorial as documented
>> this time around and I can't get past the initial SNAP train stage.
>>
>> I have a single genome file with, 10 Long scaffolds making up just under
>> 11MB (subsampled from my original full length assembly) of sequence data in
>> which to train SNAP. The fasta file is not corrupted, and has been
>> generated in various ways in order to test formatting issues etc.
>>
>> I have only edited the maker_opts file and changed:
>>
>> *genome=*
>> *protein=*
>> *protein2genome=1*
>>
>> But see attached my maker CTL files.
>>
>> The error consistently returned to me:
>>
>> *Skipping the contig because it is too short!!*
>> *SeqID: contig_WHATEVER*
>> *Length: 0*
>>
>> *The sequences are no where near too short. This was verified
>> independently outside maker to be sure. *
>>
>> *The headers are as follows:*
>>
>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no
>> class=contig suggestRepeat=no suggestCircular=no
>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig
>> suggestRepeat=no suggestCircular=no
>>
>> I have just about given up, I have no idea why its happening it makes
>> zero sense.
>>
>> Any help or information as to why this might be happening would be
>> amazing.
>>
>> Thank you in advance.
>> Lahcen
>>
>> --
>> ==========================================
>> > Dr. Lahcen Campbell                                                  <
>> > Contact: lahcencampbell at gmail.com                        <
>> > https://www.ebi.ac.uk/about/people/lahcen-campbell <
>> ==========================================
>> <maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>____________
>> ___________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>>
>
>
> --
> ==========================================
> > Dr. Lahcen Campbell                                                  <
> > Contact: lahcencampbell at gmail.com                        <
> > https://www.ebi.ac.uk/about/people/lahcen-campbell <
> ==========================================
>
>
>


-- 
==========================================
> Dr. Lahcen Campbell                                                  <
> Contact: lahcencampbell at gmail.com                        <
> https://www.ebi.ac.uk/about/people/lahcen-campbell <
==========================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171115/e0dc245f/attachment-0001.html>


More information about the maker-devel mailing list