[maker-devel] Use pass-through system to add missing genes

Anastasia Gioti anastasia.gioti at scilifelab.se
Thu May 17 05:27:04 MDT 2012


Hi Barry,
Thanks for your detailed instructions. You well understood that I have  
already included the proteins of the closely related species in my  
protein evidence dataset, but still did not get the genes. I have now  
blasted (P) the missing 949 proteins from this species against my  
nonoverlaping_abinits.fasta proteins and have found 618 good hits,  
which i guess I can promote to models using the routine no 2 of your  
last email and Carson's script gff3_select.
I have also looked at the rest of the proteins (331) for which there  
was no model in the nonoverlaping_abinits.fasta. I will try to  
describe 2 examples I looked at in apollo:

1) ab initio models predicted a ~7.5 kb gene covering 3 genes (as  
predicted in the closely related species). Blastx+protein2genome  
similarities were reported for two of these genes, but not for the 3rd  
(the one in the middle). MAKER finally decided to call two genes,  
respecting the blastx+protein2genome evidence, but the 3rd was lost.
I have previously  reported here that MAKER tends to  fuse genes in  
multi-exonic genes and others reported that too, I remember you  
proposed changing a papameter to alter this. To keep in mind for my  
final strategy that i am trying to decide on (for the moment i have  
not rerun MAKER).
For this case, abinitio models do not exist for the gene (in the sense  
that the existing models overlap many genes) and the similarity to the  
protein of the closely related species was not judged sufficient,  
although when i look at a TblastN alignment for this area it looks  
fine to me.

2) Only the 3' end of the gene was called by MAKER, despite blastx  
+protein2genome evidence from the closely related species for the  
entire region. Abinitio models existed as 2 separate genes , one for  
the 3' end region (finally retained by MAKER in a consensus decision I  
guess) and one for the 5' region, but here not all predictors called  
an orf, and finally nothing was called in this region.
In this case, it is a misannotation rather, but which misses a very  
important part of the gene.
I hope my descriptions are clear, otherwise I can provide you the gff  
file of these 2 examples to look by yourself.

I am not very clear about what to do about these 331 cases (which I do  
not know how to look at as well, except for random examples' viwing in  
Apollo). I feel that a second MAKER run would be probably the  
solution, this time providing as pred_gff  the result of a blast  
against the 331. But still, the existing annotations would then have  
to be somehow updated as the new predictions are in conflict with them  
(see example 2). I am a bit confused.
to recap, what would you suggest for the 331 still-missing proteins in  
terms of asessing their profiles n a rather automatic way and in  
inluding them in my annotations without going deep into manual gene  
curation?
Many thnks,
Anastasia
>
> Let me just restate what you've said so that I can be sure that I am  
> correct about what you've already done.  You have run Maker with  
> SNAP, Genemark and Augustus using EST from a closely related species  
> (passed to altest) and protein evidence from other fungi.  You are  
> missing about 1,000 genes compared to the species that provided the  
> EST alignments.  You say their is good evidence that these genes  
> exist from the alignments and I assume by this that you mean the EST/ 
> protein alignments that Maker produced.
>
> 1) Is the closely related fungus annotated and if so have you  
> included it's proteins in the evidence set that you provided to  
> Maker.  If you haven't provided these proteins as evidence to maker  
> then you should do this.  You can re-run maker passing your original  
> models back through like this:
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=1
> other_pass=1
>
> #-----Protein Homology Evidence (for best results provide a file for  
> at least one)
> protein=proteins_from_closely_related.fasta
> ## OR it sounds like you've already aligned these with exonerate?
> protein_gff=proteins_from_closely_related_already_aligned.gff
>
> 2) If you've already included those closely related species proteins  
> but still didn't get the 1,000 genes, then take your  
> nonoverlaping_abinits.fasta and blast them directly against your  
> closely related proteins.  Presumably they don't hit too well  
> because if they did they should have been promoted to predictions by  
> Maker the first time, but here you can decide yourself what  
> thresholds to allow to keep the abinit predictions that hit the  
> closely related species proteins.  If you filter you blast hits the  
> way you want and keep the names of the abinit predictions that pass  
> your filter, then use the script Carson attached it it will generate  
> a abinit precidtion GFF file with only the predictions you  
> selected.  You can then pass those predictions back to Maker and  
> force it to keep them and Maker will turn them from predictions  
> (match/match_part) into gene models.
>
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=0
> other_pass=1
>
> #-----Gene Prediction
> snaphmm=
> gmhmm=
> augustus_species=
> fgenesh_par_file=
> pred_gff=ab_init_predictions_rescued_by_blast.gff
>
> keep_preds=1
>
> Barry
>
>>> Thanks,
>>> Carson
>>>
>>> From: Anastasia Gioti <anastasia.gioti at scilifelab.se>
>>> Date: Wed, 25 Apr 2012 11:09:36 +0200
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Use pass-through system to add missing genes
>>>
>>> Hi,
>>> I  have a set of predicted proteins from the genome of a fungus  
>>> annotated by MAKER  using EST data from a closely related species  
>>> and 3 ab initio predictors  (snap iterativelly trained 3 times,  
>>> genemark trained directly on the assembly and augustus with a  
>>> model from a less closely related species), along with a set of  
>>> fungal proteins. I am missing ~ 1000 proteins when I compare to  
>>> the species i used EST data from, and there is good evidence from  
>>> alignments that these genes exist. The question is how to proceed  
>>> from Blast hits to actual gene models here. The idea would be to  
>>> add these genes to the existing dataset, rather than reannotate  
>>> the genome. I believe that reannotating it without any further  
>>> evidence such as RNA-seq from the species itself would not change  
>>> much,and i d rather stick with actual predictions that i trust and  
>>> have used in subsequent analyses. The 1000 genes I can accept to  
>>> annotate with a less stringent and reliable way than MAKER, I just  
>>> want to add them so that the difference in gene count gets  
>>> corrected.
>>> I was reading the MAKER 2 paper and i was wondering if I can use  
>>> the legacy annotations scheme to do it, by providing GFF3 of the  
>>> alignments between the two species in the regions where genes were  
>>> missed, but as i said, I would not like to reannotate the whole  
>>> genome, and running MAKER2 might cause slight changes that i d  
>>> like to avoid. Is this possible? First, is it possible to provide  
>>> a Gff3 file of specific locations and not the entire genome  
>>> alignment? (I guess so..) Second, how can I tag the existing  
>>> annotations as 'not to be changed' or alternatively, tag the new  
>>> models only? How should I run maker2, with which predictors on and  
>>> which off?
>>> Thanks,
>>> Anastasia
>>>
>>> Anastasia Gioti
>>> Post-doctoral Researcher
>>>
>>> anastasia.gioti at scilifelab.se
>>> anastasia.gioti at ebc.uu.se
>>>
>>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>>>
>>>
>>>
>>> _______________________________________________ maker-devel  
>>> mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> <gff3_select>
>>
>> Anastasia Gioti
>> Post-doctoral Researcher
>>
>> anastasia.gioti at scilifelab.se
>> anastasia.gioti at ebc.uu.se
>>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/ 
>> Gioti_Anastasia/
>>
>>
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- 
> lab.org

Anastasia (Natassa) Gioti
Post-Doc Researcher
Evolutionary Biology Department Uppsala University -Science for Life  
lab, Karolinska Institute Stockholm
anastasia.gioti at ebc.uu.se
anastasia.gioti at scilifelab.se

http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/











More information about the maker-devel mailing list