[maker-devel] Use pass-through system to add missing genes

Collett, James R james.collett at pnnl.gov
Fri Apr 27 10:51:05 MDT 2012


Hi Carson,

Could you please send me (or make available for download) the perl script that you mentioned in this previous post in this thread?

>> Attached is a 
>> script that would make selecting those easier.  It take the MAKER 
>> generated GFF3 and a list of predictions to keep (one name per line).  
>> These might be the results of a BLAST analysis for example.  It will 
>> then return the GFF3 entries for just those models selected.

Thanks,

Jim
__________________________________________________
James R. Collett, Ph.D.
Senior Scientist
Chemical and Biological Process Development Group
Energy and Environment Directorate
Pacific Northwest National Laboratory

> -----Original Message-----
> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel-
> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell-
> lab.org
> Sent: Friday, April 27, 2012 6:48 AM
> To: maker-devel at yandell-lab.org
> Subject: maker-devel Digest, Vol 47, Issue 14
> 
> Send maker-devel mailing list submissions to
> 	maker-devel at yandell-lab.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
> lab.org
> 
> or, via email, send a message with subject or body 'help' to
> 	maker-devel-request at yandell-lab.org
> 
> You can reach the person managing the list at
> 	maker-devel-owner at yandell-lab.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of maker-devel digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Use pass-through system to add missing genes (Carson Holt)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 27 Apr 2012 09:27:24 -0400
> From: Carson Holt <carsonhh at gmail.com>
> To: Barry Moore <barry.moore at genetics.utah.edu>,	Anastasia Gioti
> 	<anastasia.gioti at scilifelab.se>
> Cc: maker-devel at yandell-lab.org
> Subject: Re: [maker-devel] Use pass-through system to add missing
> 	genes
> Message-ID: <CBC01559.BF45%carsonhh at gmail.com>
> Content-Type: text/plain; charset="us-ascii"
> 
> > It is a mixture of cases, and I can only look at some examples to say
> that.
> > There are cases where all 3 used ab initio predictors provide models,
> > there are blastx hits, or both blastx and protein2 genome, but no EST
> > evidence, thus no model is retained. i guess my default parameters
> > could be responsible for these cases at least.
> 
> The only way you should be able to get BLASTX overlap and still not get
> a model for the region is if 1.  The protein alignment in in a
> different reading frame then your models for every single base pair of
> the alignment (in which case it's not true overlap).  2. The BLASTX
> HSPs are stacked on each other again and again in weird rearranged
> overlaps to produce a very deep alignment which would mean this is a
> repetitive region and is not really a significant alignment.  Otherwise
> this should not happen unless you have the AED_threshold set to some
> value where MAKER will ignore genes unless they have a minimum amount
> of support (by default this option is always off).  The other two
> possibilities can be tested by just looking at the alignments manually
> in Apollo.  Also take a look at the AED and eAED values for your
> missing genes.  Anything below 1 should always be kept by MAKER by
> default because it has at least some evidence supported.
> 
> > which fasta do you refer to? The proteins file I use as evidence
> > contains all proteins i can actually use.
> 
> If they are already in your current run ignore this.
> 
> Barry provided detailed instructions on how to configure MAKER, for
> your particular case.  So just follow his excellent instructions.
> 
> Thanks,
> Carson
> 
> 
> 
> From:  Barry Moore <barry.moore at genetics.utah.edu>
> Date:  Friday, 27 April, 2012 7:57 AM
> To:  Anastasia Gioti <anastasia.gioti at scilifelab.se>
> Cc:  Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
> Subject:  Re: [maker-devel] Use pass-through system to add missing
> genes
> 
> Hi Anastasia,
> 
> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:
> 
> > Hi Carlson,
> > Thanks for your help!
> >
> >> The way you proceed depends on why the genes are not there to begin
> with.
> >> Are they not there because of a lack of evidence?
> >
> > It is a mixture of cases, and I can only look at some examples to say
> that.
> > There are cases where all 3 used ab initio predictors provide models,
> > there are blastx hits, or both blastx and protein2 genome, but no EST
> > evidence, thus no model is retained. i guess my default parameters
> > could be responsible for these cases at least.
> >
> 
> This doesn't sound right.  If there are predicted models and blastx
> protein evidence overlapping them you should get a model retained.  I
> know for the EST evidence that it has to support a splice site before
> it will be promoted and I can't remember if protein evidence is the
> same but certainly if you pass back those protein2genome predictions
> and the original proteins as evidence then they will be retained as
> models.
> 
> >> If that's the case just adding the new fasta file should do the
> trick.
> >
> > which fasta do you refer to? The proteins file I use as evidence
> > contains all proteins i can actually use.
> >
> 
> Yes using the protein fasta from the closely related species as
> evidence.  I think you said you've already done that right?
> 
> 
> >> Or are they not there because an assembly error makes it impossible
> >> to get a logical model for the region (I.e reading frame breaks).
> >
> > This is not the case in general.
> >
> >> Are there ab initio models already called in those regions that
> could
> >> just be promoted to the annotation tier?  You can test that one by
> >> blasting against the nonoverlaping_abinits.fasta files.
> >
> > I have not done this, will do!
> >
> >>
> >> For any of the cases described, you can provide the existing
> >> annotation set as the input in GFF3 format, and previous models will
> >> be maintained preferentially.
> >
> > You mean in a new maker run? is this possible with the old maker as
> > well, not maker2, right?
> >
> 
> Yes, the original MAKER will do this.
> 
> 
> >> If you know which ab initio predictions you want to add (I.e. the ab
> >> initio promoting scenario I descibed), you can provide those
> >> predictions to the use the pred_gff option and then set keep_preds=1
> >> and they will be maintained even without evidence.  Attached is a
> >> script that would make selecting those easier.  It take the MAKER
> >> generated GFF3 and a list of predictions to keep (one name per
> line).
> >> These might be the results of a BLAST analysis for example.  It will
> >> then return the GFF3 entries for just those models selected.
> >
> > The thing is, for the few cases I have looked at, I cannot really
> > decide which model is the best, and the 3 models from the ab initio
> > predictors do not agree on the exact intron-exon junctions or the
> start and stop codons.
> >>
> >> If the situation is more complex, just provide more detail, and I am
> >> sure we can help you come up with a plan.
> >>
> > What i was thinking to do was to provide a gff file of alignments (eg
> > by
> > exonerate) to the proteins of the closely related species that i am
> > missing, and somehow keep the previous annotations and get the extra
> > ones by this gff file. But how exactly maker should be run to do this
> > I am not sure. if I want to keep the previous annotations I  need the
> > gff file of the last maker run as input, but then how do I
> > discriminate with the exonerate gff file? And which mode of rediction
> > should be on, and with which parameters? You mention
> > keep_preds=1  for the existing annotations, but how do i also promote
> > evidence from alignments on the same way in the same run?
> > Looks feasible though. Thanks again,
> > Anastasia
> >
> 
> Let me just restate what you've said so that I can be sure that I am
> correct about what you've already done.  You have run Maker with SNAP,
> Genemark and Augustus using EST from a closely related species (passed
> to altest) and protein evidence from other fungi.  You are missing
> about 1,000 genes compared to the species that provided the EST
> alignments.  You say their is good evidence that these genes exist from
> the alignments and I assume by this that you mean the EST/protein
> alignments that Maker produced.
> 
> 1) Is the closely related fungus annotated and if so have you included
> it's proteins in the evidence set that you provided to Maker.  If you
> haven't provided these proteins as evidence to maker then you should do
> this.  You can re-run maker passing your original models back through
> like this:
> 
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=1
> other_pass=1
> 
> #-----Protein Homology Evidence (for best results provide a file for at
> least one) protein=proteins_from_closely_related.fasta
> ## OR it sounds like you've already aligned these with exonerate?
> protein_gff=proteins_from_closely_related_already_aligned.gff
> 
> 2) If you've already included those closely related species proteins
> but still didn't get the 1,000 genes, then take your
> nonoverlaping_abinits.fasta and blast them directly against your
> closely related proteins.  Presumably they don't hit too well because
> if they did they should have been promoted to predictions by Maker the
> first time, but here you can decide yourself what thresholds to allow
> to keep the abinit predictions that hit the closely related species
> proteins.  If you filter you blast hits the way you want and keep the
> names of the abinit predictions that pass your filter, then use the
> script Carson attached it it will generate a abinit precidtion GFF file
> with only the predictions you selected.  You can then pass those
> predictions back to Maker and force it to keep them and Maker will turn
> them from predictions
> (match/match_part) into gene models.
> 
> #-----Re-annotation Using MAKER Derived GFF3
> genome_gff=original_maker_annotations.gff3
> est_pass=1
> altest_pass=1
> protein_pass=1
> rm_pass=1
> model_pass=1
> pred_pass=0
> other_pass=1
> 
> #-----Gene Prediction
> snaphmm=
> gmhmm=
> augustus_species=
> fgenesh_par_file=
> pred_gff=ab_init_predictions_rescued_by_blast.gff
> 
> keep_preds=1
> 
> Barry
> 
> >> Thanks,
> >> Carson
> >>
> >> From:  Anastasia Gioti <anastasia.gioti at scilifelab.se>
> >> Date:  Wed, 25 Apr 2012 11:09:36 +0200
> >> To:  <maker-devel at yandell-lab.org>
> >> Subject:  [maker-devel] Use pass-through system to add missing genes
> >>
> >> Hi,
> >> I  have a set of predicted proteins from the genome of a fungus
> >> annotated by MAKER  using EST data from a closely related species
> and
> >> 3 ab initio predictors  (snap iterativelly trained 3 times, genemark
> >> trained directly on the assembly and augustus with a model from a
> >> less closely related species), along with a set of fungal proteins.
> I
> >> am missing ~ 1000 proteins when I compare to the species i used EST
> >> data from, and there is good evidence from alignments that these
> >> genes exist. The question is how to proceed from Blast hits to
> actual
> >> gene models here. The idea would be to add these genes to the
> >> existing dataset, rather than reannotate the genome. I believe that
> >> reannotating it without any further evidence such as RNA-seq from
> the
> >> species itself would not change much,and i d rather stick with
> actual
> >> predictions that i trust and have used in subsequent analyses. The
> >> 1000 genes I can accept to annotate with a less stringent and
> reliable way than MAKER, I just want to add them so that the difference
> in gene count gets corrected.
> >> I was reading the MAKER 2 paper and i was wondering if I can use the
> >> legacy annotations scheme to do it, by providing GFF3 of the
> >> alignments between the two species in the regions where genes were
> >> missed, but as i said, I would not like to reannotate the whole
> >> genome, and running MAKER2 might cause slight changes that i d like
> >> to avoid. Is this possible? First, is it possible to provide a Gff3
> >> file of specific locations and not the entire genome alignment? (I
> >> guess so..) Second, how can I tag the existing annotations as 'not
> to be changed' or alternatively, tag the new models only?
> >> How should I run maker2, with which predictors on and which off?
> >> Thanks,
> >> Anastasia
> >>
> >> Anastasia Gioti
> >> Post-doctoral Researcher
> >>
> >> anastasia.gioti at scilifelab.se
> >> anastasia.gioti at ebc.uu.se
> >>
> >>
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia
> >> /
> >>
> >>
> >>
> >> _______________________________________________ maker-devel mailing
> >> list
> >> maker-
> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis
> >> tinfo/ma
> >> ker-devel_yandell-lab.org
> >> <gff3_select>
> >
> > Anastasia Gioti
> > Post-doctoral Researcher
> >
> > anastasia.gioti at scilifelab.se
> > anastasia.gioti at ebc.uu.se
> >
> >
> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
> >
> >
> >
> > _______________________________________________
> > maker-devel mailing list
> > maker-devel at box290.bluehost.com
> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
> lab.or
> > g
> 
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
> 
> 
> 
> 
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-
> lab.org/attachments/20120427/72b70d49/attachment.html>
> 
> ------------------------------
> 
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
> 
> 
> End of maker-devel Digest, Vol 47, Issue 14
> *******************************************




More information about the maker-devel mailing list