[maker-devel] Use pass-through system to add missing genes
Anastasia Gioti
anastasia.gioti at scilifelab.se
Fri Apr 27 02:43:14 MDT 2012
Hi Carlson,
Thanks for your help!
> The way you proceed depends on why the genes are not there to begin
> with. Are they not there because of a lack of evidence?
It is a mixture of cases, and I can only look at some examples to say
that. There are cases where all 3 used ab initio predictors provide
models, there are blastx hits, or both blastx and protein2 genome, but
no EST evidence, thus no model is retained. i guess my default
parameters could be responsible for these cases at least.
> If that's the case just adding the new fasta file should do the trick.
which fasta do you refer to? The proteins file I use as evidence
contains all proteins i can actually use.
> Or are they not there because an assembly error makes it impossible
> to get a logical model for the region (I.e reading frame breaks).
This is not the case in general.
> Are there ab initio models already called in those regions that
> could just be promoted to the annotation tier? You can test that
> one by blasting against the nonoverlaping_abinits.fasta files.
I have not done this, will do!
>
> For any of the cases described, you can provide the existing
> annotation set as the input in GFF3 format, and previous models will
> be maintained preferentially.
You mean in a new maker run? is this possible with the old maker as
well, not maker2, right?
> If you know which ab initio predictions you want to add (I.e. the ab
> initio promoting scenario I descibed), you can provide those
> predictions to the use the pred_gff option and then set keep_preds=1
> and they will be maintained even without evidence. Attached is a
> script that would make selecting those easier. It take the MAKER
> generated GFF3 and a list of predictions to keep (one name per
> line). These might be the results of a BLAST analysis for example.
> It will then return the GFF3 entries for just those models selected.
The thing is, for the few cases I have looked at, I cannot really
decide which model is the best, and the 3 models from the ab initio
predictors do not agree on the exact intron-exon junctions or the
start and stop codons.
>
> If the situation is more complex, just provide more detail, and I am
> sure we can help you come up with a plan.
>
What i was thinking to do was to provide a gff file of alignments (eg
by exonerate) to the proteins of the closely related species that i
am missing, and somehow keep the previous annotations and get the
extra ones by this gff file. But how exactly maker should be run to do
this I am not sure. if I want to keep the previous annotations I need
the gff file of the last maker run as input, but then how do I
discriminate with the exonerate gff file? And which mode of rediction
should be on, and with which parameters? You mention keep_preds=1 for
the existing annotations, but how do i also promote evidence from
alignments on the same way in the same run?
Looks feasible though. Thanks again,
Anastasia
> Thanks,
> Carson
>
> From: Anastasia Gioti <anastasia.gioti at scilifelab.se>
> Date: Wed, 25 Apr 2012 11:09:36 +0200
> To: <maker-devel at yandell-lab.org>
> Subject: [maker-devel] Use pass-through system to add missing genes
>
> Hi,
> I have a set of predicted proteins from the genome of a fungus
> annotated by MAKER using EST data from a closely related species
> and 3 ab initio predictors (snap iterativelly trained 3 times,
> genemark trained directly on the assembly and augustus with a model
> from a less closely related species), along with a set of fungal
> proteins. I am missing ~ 1000 proteins when I compare to the species
> i used EST data from, and there is good evidence from alignments
> that these genes exist. The question is how to proceed from Blast
> hits to actual gene models here. The idea would be to add these
> genes to the existing dataset, rather than reannotate the genome. I
> believe that reannotating it without any further evidence such as
> RNA-seq from the species itself would not change much,and i d rather
> stick with actual predictions that i trust and have used in
> subsequent analyses. The 1000 genes I can accept to annotate with a
> less stringent and reliable way than MAKER, I just want to add them
> so that the difference in gene count gets corrected.
> I was reading the MAKER 2 paper and i was wondering if I can use the
> legacy annotations scheme to do it, by providing GFF3 of the
> alignments between the two species in the regions where genes were
> missed, but as i said, I would not like to reannotate the whole
> genome, and running MAKER2 might cause slight changes that i d like
> to avoid. Is this possible? First, is it possible to provide a Gff3
> file of specific locations and not the entire genome alignment? (I
> guess so..) Second, how can I tag the existing annotations as 'not
> to be changed' or alternatively, tag the new models only? How should
> I run maker2, with which predictors on and which off?
> Thanks,
> Anastasia
>
> Anastasia Gioti
> Post-doctoral Researcher
>
> anastasia.gioti at scilifelab.se
> anastasia.gioti at ebc.uu.se
>
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>
>
>
> _______________________________________________ maker-devel mailing
> list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> <gff3_select>
Anastasia Gioti
Post-doctoral Researcher
anastasia.gioti at scilifelab.se
anastasia.gioti at ebc.uu.se
http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120427/d8abfc84/attachment-0003.html>
More information about the maker-devel
mailing list