[maker-devel] Use pass-through system to add missing genes

Fri Apr 27 02:43:14 MDT 2012

Hi Carlson,
Thanks for your help!

> The way you proceed depends on why the genes are not there to begin  
> with.  Are they not there because of a lack of evidence?

It is a mixture of cases, and I can only look at some examples to say  
that. There are cases where all 3 used ab initio predictors provide  
models, there are blastx hits, or both blastx and protein2 genome, but  
no EST evidence, thus  no model is retained. i guess my default  
parameters could be responsible for these cases at least.

> If that's the case just adding the new fasta file should do the trick.

which fasta do you refer to? The proteins file I use as evidence  
contains all proteins i can actually use.

> Or are they not there because an assembly error makes it impossible  
> to get a logical model for the region (I.e reading frame breaks).

This is not the case in general.

> Are there ab initio models already called in those regions that  
> could just be promoted to the annotation tier?  You can test that  
> one by blasting against the nonoverlaping_abinits.fasta files.

I have not done this, will do!

>
> For any of the cases described, you can provide the existing  
> annotation set as the input in GFF3 format, and previous models will  
> be maintained preferentially.

You mean in a new maker run? is this possible with the old maker as  
well, not maker2, right?

> If you know which ab initio predictions you want to add (I.e. the ab  
> initio promoting scenario I descibed), you can provide those  
> predictions to the use the pred_gff option and then set keep_preds=1  
> and they will be maintained even without evidence.  Attached is a  
> script that would make selecting those easier.  It take the MAKER  
> generated GFF3 and a list of predictions to keep (one name per  
> line).  These might be the results of a BLAST analysis for example.   
> It will then return the GFF3 entries for just those models selected.

The thing is, for the few cases I have looked at, I cannot really  
decide which model is the best, and the 3 models from the ab initio  
predictors do not agree on the exact intron-exon junctions or the  
start and stop codons.
>
> If the situation is more complex, just provide more detail, and I am  
> sure we can help you come up with a plan.
>
What i was thinking to do was to provide a gff file of alignments (eg  
by exonerate) to the proteins of the closely related species that i  
am  missing, and somehow keep the previous annotations and get the  
extra ones by this gff file. But how exactly maker should be run to do  
this I am not sure. if I want to keep the previous annotations I  need  
the gff file of the last maker run as input, but then how do I  
discriminate with the exonerate gff file? And which mode of rediction  
should be on, and with which parameters? You mention keep_preds=1  for  
the existing annotations, but how do i also promote evidence from  
alignments on the same way in the same run?
Looks feasible though. Thanks again,
Anastasia

> Thanks,
> Carson
>
> From: Anastasia Gioti <anastasia.gioti at scilifelab.se>
> Date: Wed, 25 Apr 2012 11:09:36 +0200
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Use pass-through system to add missing genes
>
> Hi,
> I  have a set of predicted proteins from the genome of a fungus  
> annotated by MAKER  using EST data from a closely related species  
> and 3 ab initio predictors  (snap iterativelly trained 3 times,  
> genemark trained directly on the assembly and augustus with a model  
> from a less closely related species), along with a set of fungal  
> proteins. I am missing ~ 1000 proteins when I compare to the species  
> i used EST data from, and there is good evidence from alignments  
> that these genes exist. The question is how to proceed from Blast  
> hits to actual gene models here. The idea would be to add these  
> genes to the existing dataset, rather than reannotate the genome. I  
> believe that reannotating it without any further evidence such as  
> RNA-seq from the species itself would not change much,and i d rather  
> stick with actual predictions that i trust and have used in  
> subsequent analyses. The 1000 genes I can accept to annotate with a  
> less stringent and reliable way than MAKER, I just want to add them  
> so that the difference in gene count gets corrected.
> I was reading the MAKER 2 paper and i was wondering if I can use the  
> legacy annotations scheme to do it, by providing GFF3 of the  
> alignments between the two species in the regions where genes were  
> missed, but as i said, I would not like to reannotate the whole  
> genome, and running MAKER2 might cause slight changes that i d like  
> to avoid. Is this possible? First, is it possible to provide a Gff3  
> file of specific locations and not the entire genome alignment? (I  
> guess so..) Second, how can I tag the existing annotations as 'not  
> to be changed' or alternatively, tag the new models only? How should  
> I run maker2, with which predictors on and which off?
> Thanks,
> Anastasia
>
> Anastasia Gioti
> Post-doctoral Researcher
>
> anastasia.gioti at scilifelab.se
> anastasia.gioti at ebc.uu.se
>
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>
>
>
> _______________________________________________ maker-devel mailing  
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> <gff3_select>

Anastasia Gioti
Post-doctoral Researcher

anastasia.gioti at scilifelab.se
anastasia.gioti at ebc.uu.se

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120427/d8abfc84/attachment-0003.html>