[maker-devel] Use pass-through system to add missing genes
Carson Holt
carsonhh at gmail.com
Fri Apr 27 11:18:23 MDT 2012
Here you go. This will also be part of the next MAKER release in some
form.
Thanks,
Carson
On 12-04-27 12:51 PM, "Collett, James R" <james.collett at pnnl.gov> wrote:
>Hi Carson,
>
>Could you please send me (or make available for download) the perl script
>that you mentioned in this previous post in this thread?
>
>>> Attached is a
>>> script that would make selecting those easier. It take the MAKER
>>> generated GFF3 and a list of predictions to keep (one name per line).
>>> These might be the results of a BLAST analysis for example. It will
>>> then return the GFF3 entries for just those models selected.
>
>Thanks,
>
>Jim
>__________________________________________________
>James R. Collett, Ph.D.
>Senior Scientist
>Chemical and Biological Process Development Group
>Energy and Environment Directorate
>Pacific Northwest National Laboratory
>
>> -----Original Message-----
>> From: maker-devel-bounces at yandell-lab.org [mailto:maker-devel-
>> bounces at yandell-lab.org] On Behalf Of maker-devel-request at yandell-
>> lab.org
>> Sent: Friday, April 27, 2012 6:48 AM
>> To: maker-devel at yandell-lab.org
>> Subject: maker-devel Digest, Vol 47, Issue 14
>>
>> Send maker-devel mailing list submissions to
>> maker-devel at yandell-lab.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
>> lab.org
>>
>> or, via email, send a message with subject or body 'help' to
>> maker-devel-request at yandell-lab.org
>>
>> You can reach the person managing the list at
>> maker-devel-owner at yandell-lab.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of maker-devel digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Use pass-through system to add missing genes (Carson Holt)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 27 Apr 2012 09:27:24 -0400
>> From: Carson Holt <carsonhh at gmail.com>
>> To: Barry Moore <barry.moore at genetics.utah.edu>, Anastasia Gioti
>> <anastasia.gioti at scilifelab.se>
>> Cc: maker-devel at yandell-lab.org
>> Subject: Re: [maker-devel] Use pass-through system to add missing
>> genes
>> Message-ID: <CBC01559.BF45%carsonhh at gmail.com>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> > It is a mixture of cases, and I can only look at some examples to say
>> that.
>> > There are cases where all 3 used ab initio predictors provide models,
>> > there are blastx hits, or both blastx and protein2 genome, but no EST
>> > evidence, thus no model is retained. i guess my default parameters
>> > could be responsible for these cases at least.
>>
>> The only way you should be able to get BLASTX overlap and still not get
>> a model for the region is if 1. The protein alignment in in a
>> different reading frame then your models for every single base pair of
>> the alignment (in which case it's not true overlap). 2. The BLASTX
>> HSPs are stacked on each other again and again in weird rearranged
>> overlaps to produce a very deep alignment which would mean this is a
>> repetitive region and is not really a significant alignment. Otherwise
>> this should not happen unless you have the AED_threshold set to some
>> value where MAKER will ignore genes unless they have a minimum amount
>> of support (by default this option is always off). The other two
>> possibilities can be tested by just looking at the alignments manually
>> in Apollo. Also take a look at the AED and eAED values for your
>> missing genes. Anything below 1 should always be kept by MAKER by
>> default because it has at least some evidence supported.
>>
>> > which fasta do you refer to? The proteins file I use as evidence
>> > contains all proteins i can actually use.
>>
>> If they are already in your current run ignore this.
>>
>> Barry provided detailed instructions on how to configure MAKER, for
>> your particular case. So just follow his excellent instructions.
>>
>> Thanks,
>> Carson
>>
>>
>>
>> From: Barry Moore <barry.moore at genetics.utah.edu>
>> Date: Friday, 27 April, 2012 7:57 AM
>> To: Anastasia Gioti <anastasia.gioti at scilifelab.se>
>> Cc: Carson Holt <carsonhh at gmail.com>, <maker-devel at yandell-lab.org>
>> Subject: Re: [maker-devel] Use pass-through system to add missing
>> genes
>>
>> Hi Anastasia,
>>
>> On Apr 27, 2012, at 2:43 AM, Anastasia Gioti wrote:
>>
>> > Hi Carlson,
>> > Thanks for your help!
>> >
>> >> The way you proceed depends on why the genes are not there to begin
>> with.
>> >> Are they not there because of a lack of evidence?
>> >
>> > It is a mixture of cases, and I can only look at some examples to say
>> that.
>> > There are cases where all 3 used ab initio predictors provide models,
>> > there are blastx hits, or both blastx and protein2 genome, but no EST
>> > evidence, thus no model is retained. i guess my default parameters
>> > could be responsible for these cases at least.
>> >
>>
>> This doesn't sound right. If there are predicted models and blastx
>> protein evidence overlapping them you should get a model retained. I
>> know for the EST evidence that it has to support a splice site before
>> it will be promoted and I can't remember if protein evidence is the
>> same but certainly if you pass back those protein2genome predictions
>> and the original proteins as evidence then they will be retained as
>> models.
>>
>> >> If that's the case just adding the new fasta file should do the
>> trick.
>> >
>> > which fasta do you refer to? The proteins file I use as evidence
>> > contains all proteins i can actually use.
>> >
>>
>> Yes using the protein fasta from the closely related species as
>> evidence. I think you said you've already done that right?
>>
>>
>> >> Or are they not there because an assembly error makes it impossible
>> >> to get a logical model for the region (I.e reading frame breaks).
>> >
>> > This is not the case in general.
>> >
>> >> Are there ab initio models already called in those regions that
>> could
>> >> just be promoted to the annotation tier? You can test that one by
>> >> blasting against the nonoverlaping_abinits.fasta files.
>> >
>> > I have not done this, will do!
>> >
>> >>
>> >> For any of the cases described, you can provide the existing
>> >> annotation set as the input in GFF3 format, and previous models will
>> >> be maintained preferentially.
>> >
>> > You mean in a new maker run? is this possible with the old maker as
>> > well, not maker2, right?
>> >
>>
>> Yes, the original MAKER will do this.
>>
>>
>> >> If you know which ab initio predictions you want to add (I.e. the ab
>> >> initio promoting scenario I descibed), you can provide those
>> >> predictions to the use the pred_gff option and then set keep_preds=1
>> >> and they will be maintained even without evidence. Attached is a
>> >> script that would make selecting those easier. It take the MAKER
>> >> generated GFF3 and a list of predictions to keep (one name per
>> line).
>> >> These might be the results of a BLAST analysis for example. It will
>> >> then return the GFF3 entries for just those models selected.
>> >
>> > The thing is, for the few cases I have looked at, I cannot really
>> > decide which model is the best, and the 3 models from the ab initio
>> > predictors do not agree on the exact intron-exon junctions or the
>> start and stop codons.
>> >>
>> >> If the situation is more complex, just provide more detail, and I am
>> >> sure we can help you come up with a plan.
>> >>
>> > What i was thinking to do was to provide a gff file of alignments (eg
>> > by
>> > exonerate) to the proteins of the closely related species that i am
>> > missing, and somehow keep the previous annotations and get the extra
>> > ones by this gff file. But how exactly maker should be run to do this
>> > I am not sure. if I want to keep the previous annotations I need the
>> > gff file of the last maker run as input, but then how do I
>> > discriminate with the exonerate gff file? And which mode of rediction
>> > should be on, and with which parameters? You mention
>> > keep_preds=1 for the existing annotations, but how do i also promote
>> > evidence from alignments on the same way in the same run?
>> > Looks feasible though. Thanks again,
>> > Anastasia
>> >
>>
>> Let me just restate what you've said so that I can be sure that I am
>> correct about what you've already done. You have run Maker with SNAP,
>> Genemark and Augustus using EST from a closely related species (passed
>> to altest) and protein evidence from other fungi. You are missing
>> about 1,000 genes compared to the species that provided the EST
>> alignments. You say their is good evidence that these genes exist from
>> the alignments and I assume by this that you mean the EST/protein
>> alignments that Maker produced.
>>
>> 1) Is the closely related fungus annotated and if so have you included
>> it's proteins in the evidence set that you provided to Maker. If you
>> haven't provided these proteins as evidence to maker then you should do
>> this. You can re-run maker passing your original models back through
>> like this:
>>
>> #-----Re-annotation Using MAKER Derived GFF3
>> genome_gff=original_maker_annotations.gff3
>> est_pass=1
>> altest_pass=1
>> protein_pass=1
>> rm_pass=1
>> model_pass=1
>> pred_pass=1
>> other_pass=1
>>
>> #-----Protein Homology Evidence (for best results provide a file for at
>> least one) protein=proteins_from_closely_related.fasta
>> ## OR it sounds like you've already aligned these with exonerate?
>> protein_gff=proteins_from_closely_related_already_aligned.gff
>>
>> 2) If you've already included those closely related species proteins
>> but still didn't get the 1,000 genes, then take your
>> nonoverlaping_abinits.fasta and blast them directly against your
>> closely related proteins. Presumably they don't hit too well because
>> if they did they should have been promoted to predictions by Maker the
>> first time, but here you can decide yourself what thresholds to allow
>> to keep the abinit predictions that hit the closely related species
>> proteins. If you filter you blast hits the way you want and keep the
>> names of the abinit predictions that pass your filter, then use the
>> script Carson attached it it will generate a abinit precidtion GFF file
>> with only the predictions you selected. You can then pass those
>> predictions back to Maker and force it to keep them and Maker will turn
>> them from predictions
>> (match/match_part) into gene models.
>>
>> #-----Re-annotation Using MAKER Derived GFF3
>> genome_gff=original_maker_annotations.gff3
>> est_pass=1
>> altest_pass=1
>> protein_pass=1
>> rm_pass=1
>> model_pass=1
>> pred_pass=0
>> other_pass=1
>>
>> #-----Gene Prediction
>> snaphmm=
>> gmhmm=
>> augustus_species=
>> fgenesh_par_file=
>> pred_gff=ab_init_predictions_rescued_by_blast.gff
>>
>> keep_preds=1
>>
>> Barry
>>
>> >> Thanks,
>> >> Carson
>> >>
>> >> From: Anastasia Gioti <anastasia.gioti at scilifelab.se>
>> >> Date: Wed, 25 Apr 2012 11:09:36 +0200
>> >> To: <maker-devel at yandell-lab.org>
>> >> Subject: [maker-devel] Use pass-through system to add missing genes
>> >>
>> >> Hi,
>> >> I have a set of predicted proteins from the genome of a fungus
>> >> annotated by MAKER using EST data from a closely related species
>> and
>> >> 3 ab initio predictors (snap iterativelly trained 3 times, genemark
>> >> trained directly on the assembly and augustus with a model from a
>> >> less closely related species), along with a set of fungal proteins.
>> I
>> >> am missing ~ 1000 proteins when I compare to the species i used EST
>> >> data from, and there is good evidence from alignments that these
>> >> genes exist. The question is how to proceed from Blast hits to
>> actual
>> >> gene models here. The idea would be to add these genes to the
>> >> existing dataset, rather than reannotate the genome. I believe that
>> >> reannotating it without any further evidence such as RNA-seq from
>> the
>> >> species itself would not change much,and i d rather stick with
>> actual
>> >> predictions that i trust and have used in subsequent analyses. The
>> >> 1000 genes I can accept to annotate with a less stringent and
>> reliable way than MAKER, I just want to add them so that the difference
>> in gene count gets corrected.
>> >> I was reading the MAKER 2 paper and i was wondering if I can use the
>> >> legacy annotations scheme to do it, by providing GFF3 of the
>> >> alignments between the two species in the regions where genes were
>> >> missed, but as i said, I would not like to reannotate the whole
>> >> genome, and running MAKER2 might cause slight changes that i d like
>> >> to avoid. Is this possible? First, is it possible to provide a Gff3
>> >> file of specific locations and not the entire genome alignment? (I
>> >> guess so..) Second, how can I tag the existing annotations as 'not
>> to be changed' or alternatively, tag the new models only?
>> >> How should I run maker2, with which predictors on and which off?
>> >> Thanks,
>> >> Anastasia
>> >>
>> >> Anastasia Gioti
>> >> Post-doctoral Researcher
>> >>
>> >> anastasia.gioti at scilifelab.se
>> >> anastasia.gioti at ebc.uu.se
>> >>
>> >>
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia
>> >> /
>> >>
>> >>
>> >>
>> >> _______________________________________________ maker-devel mailing
>> >> list
>> >> maker-
>> devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/lis
>> >> tinfo/ma
>> >> ker-devel_yandell-lab.org
>> >> <gff3_select>
>> >
>> > Anastasia Gioti
>> > Post-doctoral Researcher
>> >
>> > anastasia.gioti at scilifelab.se
>> > anastasia.gioti at ebc.uu.se
>> >
>> >
>> http://www.ebc.uu.se/Research/IEG/evbiol/people/pages/Gioti_Anastasia/
>> >
>> >
>> >
>> > _______________________________________________
>> > maker-devel mailing list
>> > maker-devel at box290.bluehost.com
>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-
>> lab.or
>> > g
>>
>> Barry Moore
>> Research Scientist
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT 84112
>> --------------------------------------------
>> (801) 585-3543
>>
>>
>>
>>
>>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://box290.bluehost.com/pipermail/maker-devel_yandell-
>> lab.org/attachments/20120427/72b70d49/attachment.html>
>>
>> ------------------------------
>>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>>
>> End of maker-devel Digest, Vol 47, Issue 14
>> *******************************************
>
>_______________________________________________
>maker-devel mailing list
>maker-devel at box290.bluehost.com
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gff3_select
Type: application/octet-stream
Size: 3067 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120427/becbc461/attachment-0003.obj>
More information about the maker-devel
mailing list