[maker-devel] Unwarranted error: Skipping the contig because it is too short

Michael Campbell michael.s.campbell1 at gmail.com
Wed Nov 15 14:50:45 MST 2017


Hi Lahcen,

I put some answers below.
> On Nov 15, 2017, at 11:32 AM, lahcen campbell <lahcencampbell at gmail.com> wrote:
> 
> Hi Michael and Carson
> 
> Thank you both for your helpful input, I really appreciate it. 
> 
> See below for my comments...
> 
> Best
> Lahcen
> 
> 
> On Tue, Nov 14, 2017 at 5:04 PM, Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
> Hi Lancen,
> 
> Thanks, the name has served me well for a number of years now :)
> 
> Its a good name, I wouldn't change it haha :) 
>  
> 
> So I started a run with your 11 scaffolds. I gave it the protein file that you sent and used all of repbase for masking. All of the scaffolds finished without error. I was hoping it would be something simple that just needed another set of eyes to see, looks like it's not the case for this one.
> 
> To further rule out a data issue I would try running it with the dpp test data that is bundled with MAKER to see if you can get the same error. This data set will run in about a minute. If you are on a cluster I would try running it with and without submitting it you the nodes and with and without mpi.
> 
> One thing that I have done in the past is to make a new directory and run maker there (this doesn't make a lot of sense but when the error doesn't make sense either it seems reasonable). 
> 
> First off, I can report good news regards the 0 lengths contigs I was getting back. Carson, your thoughts on Bioperl conflict issues seemed to be the main issue. Out cluster software environment had gone through some changes of late, so working off the basis of that I was able to load the right bash config which resulted in no more 0 length contig errors. Huzzah !! 
> Great
> 
> As far as rerunning MAKER there are a couple of approaches. If you want it to stop complaining about trying to  many times on failed contigs you can increase the number of tries in the opts file. The line looks like this:
> 
> tries=2 #number of times to try a contig if there is a failure for some reason
> 
> If you want to run it elsewhere, but you don't want to have to redo all of the repeat masking and blasting you can use the gff3 output from an earlier run. If you used gff3_merge after the first run finished you got a big gff3 file with all of the gene models and evidence. If you break up that file by the source column you can selectively pass the evidence back to MAKER. If you put all of the repeatmasker and repeatrunner entries into one file and pass it in on this line:
> 
> Can I ask, because I can't seem to find any concrete info on best practices for parsing MAKER gffs to partition the various source column fields as you described Michael. 
> 
> Is there a commonly used way to partition MAKER gffs based on source column? Or will I need to code it up, I ask because I feel this must have been needed before many times by other users.  
>  I've got a script that will do it if you want it. Since you don't need all of the entries grep is probably as easy as anyting. grep -P '\tsource\t'
> 
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> 
> I will remove links to fasta files for both 'rmlib=' and 'repeat_protein='
> Yep
> 
> you can turn off model_org= and repeat_protein=. This will speed up the next run a lot. Then you can pass in the protein2genome gff3 data on this line:
> 
> protein_gff=  #aligned protein homology evidence from an external GFF3 file
> 
> Don't pass the blast gff3 data in. If you pass in gff3 data to maker is assumes that it is polished and will not make any effort to fix alignments. the protein2genome data is polished. est2genome is the equivalent for EST input.
> 
> You say don't pass the blast as gff. As I pass in all other info via GFF3 and remove any evidence as fasta inputs... BLAST won't be called again right ? Ensuring the shortest possible rerun of MAKER to roll back to a uncorrupted state.  
> Right. blast will not be called as long as you remove or comment out the paths to the fastas in the est= and protein= lines.

> I noticed that the only unique source field types in my MAKER GFF are as follows: 
> augustus_masked 
> blastx
> maker
> protein2genome
> repeatmasker
> repeatrunner
> That look right for the run you described
> I read on the dev group that passing est evidence as GFF won't actually call Exonerate, est2genome option just tells MAKER to try and turn polished EST alignments directly into genes.... so If I pass this info again as GFF it will simply use the same info as it did originally and not have to recompute anything ? 
> 
> Based on the above fields contained in my MAKER gff, which of the following options should I select to re-annotate based on this older run ? I suspect all the options below in green should be set to 1, and the others in red set to 0. 
> 
> #-----Re-annotation Using MAKER Derived GFF3
> .....
> est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=1 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=1 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=1 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no 
> 
You don't need model_pass or pred_pass if you plan on running gene finders
> I don't think I will pass back anything under augustus_masked as I didn't set that up correctly initially, instead passing in a precomputed augustus gff which Im told isn't the best way to run MAKER. So if I can get back to a state of not failing all contigs, I will run Augustus inside maker itself on the 2nd pass. Note though, I am aware of the order of things normally, but for this instance I will continue with what I have done with success previously. 
Yeah, when I have issues with failing contigs I'll pull stuff out until it starts running without error, then I add things back until something breaks.

> Lastly, as this next run will be updating based on previous generated MAKER gff data.... what states should est2genome and protein2genome be ? 1 or 0 ? 
0 those options are just for generating gene models directly from evidence when you don't have any gene finders trained. When you say updating do you mean reusing evidence from previous runs and generating new gene annotations or are you taking existing gene models and adding new evidence to see if they can be improved?
> 
> Apologies for the lengthy email reply Michael. Much appreciated again, thank you !! 
No Worries, hope it helps.
> 
> L
> 
>  
> Clean_up is useful if you are running on a file system that limits the number of files that you can write. It removes all of the intermediate files used in the annotation. This takes away the advantage of rerunning in the same directory. clean_try deletes everything first, and starts again. clean_try is the one that deletes everything and pretends that the first run never happened. 
> 
> I ccd the list on this response just Incas anyone else has any ideas or is facing the same error.
> 
> Let me know if any of this helps,
> Mike
> 
>> On Nov 14, 2017, at 10:48 AM, lahcen campbell <lahcencampbell at gmail.com <mailto:lahcencampbell at gmail.com>> wrote:
>> 
>> Hi Michael 
>> 
>> Nice name btw I have a Michael in my name too :) Lahcen Michael Campbell to be exact haha...anyway... thanks for the reply and offer to help. 
>> 
>> I have attached the file in question below. Its so strange, I had to just leave it alone cause it was making me quite frustrated. Those bugs which there are now common sense solutions are the worst cause very easily you reach a wall. 
>> 
>> Might it have anything at all to do with the Protein homology file I passed in ? Though, note.... the same protein files here have been used in another maker run without issue so I kind of ruled that out already.....but just spitballing at this stage.  
>> 
>> 
>> Might I be so cheeky to ask you one more MAKER related question Michael... ? Feel free to ignore it I hate to push but im desperate to figure it out with little time to do so... 
>> 
>> I have an issue with a different MAKER analysis. Currently any new run I attempt on this datastore, which has one round successful with 25000 odd genes and double the transcripts. I attempted to run the second round with a SNAP trained hmm (first time passing in SNAP hmm following first round EST/Protein evidence). In this attempt, because we obtained so many genes I thought I would be more stringent by changing the AED to 0.7 from 1.0. Something I see now I didn't approach in the right way... too late now sadly.
>> 
>> MAKER finishes fine, but now it views all previous scaffolds as FAILED. Nothing seems to change this and now the datastore is for all intents and purposes locked in failed state. It keeps mentioning changes to the opts file which there were, and that the previous runs didn't finish so it must delete them. The results obtained from round 1 are still there though Im pretty sure of that, all blast files etc are still there and populated. 
>> 
>> Can you tell me the main differences either clean_up or clean_try have and which will completely and irreversibly wipe the first run? Something I don't want to repeat, just allow me to progress to the next round. Im hesitant to run them, but I've backed up the datastore incase. My next attempt will be to pass the exact same maker_opts file from the round1 run, with the only change made to clean_try/clean_up....Is this approach misguided ? 
>> 
>> Your help is very much appreciated Michael so thank you, 
>> Best
>> L
>> 
>>>>  Combined_Protein_homology.fa.zip <https://drive.google.com/file/d/19ooxfIUGygyW9GBY8uBwCYwjAywRWiL_/view?usp=drive_web>​​
>>  SubsampledGenomeFile_n10_11MB.fasta <https://drive.google.com/file/d/1Mwj6Jpf1U9xzQVgxVFqeYyQokFIrDFo5/view?usp=drive_web>​
>> 
>> 
>> 
>> On Tue, Nov 14, 2017 at 3:08 PM, Michael Campbell <michael.s.campbell1 at gmail.com <mailto:michael.s.campbell1 at gmail.com>> wrote:
>> Hi Lahcen,
>> 
>> Nothing comes right to mind for what could be causing this error. If you want to compress your FASTA and send it to me I can try and recreate the error and try and debug it.
>> 
>> Thanks,
>> Mike
>>> On Nov 14, 2017, at 7:15 AM, lahcen campbell <lahcencampbell at gmail.com <mailto:lahcencampbell at gmail.com>> wrote:
>>> 
>>> Hi MAKER community,
>>> 
>>> I was hoping someone could help me. I have a very unusual error with two different versions of maker I have tested so far. This error shouldn't be happening but it occurs time and again no matter what I try. I have tried using 2.31.6_mpich3_icc and 2.31_mpich3
>>> 
>>> Note that version 2.31.6_mpich3_icc is one I have used countless times and produced final MAKER annotations without issue. So its not that this version has issues to date. 
>>> 
>>> Basically, this is a brand new MAKER analysis, I am only trying to train SNAP in this first round. I am following the MakerTutorial as documented this time around and I can't get past the initial SNAP train stage. 
>>> 
>>> I have a single genome file with, 10 Long scaffolds making up just under 11MB (subsampled from my original full length assembly) of sequence data in which to train SNAP. The fasta file is not corrupted, and has been generated in various ways in order to test formatting issues etc. 
>>> 
>>> I have only edited the maker_opts file and changed:
>>> 
>>> genome=
>>> protein=
>>> protein2genome=1
>>> 
>>> But see attached my maker CTL files. 
>>> 
>>> The error consistently returned to me:
>>> 
>>> Skipping the contig because it is too short!!
>>> SeqID: contig_WHATEVER
>>> Length: 0
>>> 
>>> The sequences are no where near too short. This was verified independently outside maker to be sure. 
>>> 
>>> The headers are as follows:
>>> 
>>> >tig00000458 len=2889428 reads=4143 covStat=1793.77 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00000159 len=3515005 reads=5100 covStat=2143.94 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00006117 len=1009519 reads=1168 covStat=804.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00000419 len=2633986 reads=3938 covStat=1519.93 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00027677 len=108573 reads=86 covStat=86.05 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00021790 len=202251 reads=158 covStat=184.12 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00316948 len=280333 reads=237 covStat=253.23 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00019606 len=149709 reads=82 covStat=150.02 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00023852 len=189461 reads=115 covStat=192.28 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> >tig00316994 len=19742 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>>> 
>>> I have just about given up, I have no idea why its happening it makes zero sense. 
>>> 
>>> Any help or information as to why this might be happening would be amazing. 
>>> 
>>> Thank you in advance. 
>>> Lahcen
>>> 
>>> -- 
>>> ==========================================
>>> > Dr. Lahcen Campbell                                                  <
>>> > Contact: lahcencampbell at gmail.com <mailto:lahcencampbell at gmail.com>                        <
>>> > https://www.ebi.ac.uk/about/people/lahcen-campbell <https://www.ebi.ac.uk/about/people/lahcen-campbell> <
>>> ==========================================
>>> <maker_bopts.ctl><maker_exe.ctl><maker_opts.ctl>_______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org>
>> 
>> 
>> 
>> 
>> -- 
>> ==========================================
>> > Dr. Lahcen Campbell                                                  <
>> > Contact: lahcencampbell at gmail.com <mailto:lahcencampbell at gmail.com>                        <
>> > https://www.ebi.ac.uk/about/people/lahcen-campbell <https://www.ebi.ac.uk/about/people/lahcen-campbell> <
>> ==========================================
> 
> 
> 
> 
> -- 
> ==========================================
> > Dr. Lahcen Campbell                                                  <
> > Contact: lahcencampbell at gmail.com <mailto:lahcencampbell at gmail.com>                        <
> > https://www.ebi.ac.uk/about/people/lahcen-campbell <https://www.ebi.ac.uk/about/people/lahcen-campbell> <
> ==========================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20171115/ad92dd9a/attachment-0003.html>


More information about the maker-devel mailing list