[maker-devel] MAKER

Carson Hinton Holt carson.holt at genetics.utah.edu
Fri Sep 22 14:19:22 MDT 2017


All est2genome and protein2genome do is take exonerate alignments of the fasta inputs and translate the longest ORF to get a rough base model that can be used to train a gene predictor. That is why we have it in the documentation that once the predictor is trained they should be turned off.

Once you get the gene predictor trained, MAKER will feed hints to the gene predictor derived from alignments and input GFF3. These hints greatly improve the performance of the gene predictors. MAKER will also use the alignemnts to filter out predictions htat do not match the evidence alignments.

—Carson


> On Sep 22, 2017, at 2:15 PM, Keilwagen, Jens <jens.keilwagen at julius-kuehn.de> wrote:
> 
> Hi Carson,
> 
> Thanks a lot for the information.
> 
> Just to be sure that I understand you right: It is impossible to obtain MAKER results based on RNA-seq and homology that differ from purely homology-based MAKER results?
> 
> Could you confirm that?
> 
> Thanks a lot and best regards, Jens
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Carson Hinton Holt [mailto:carson.holt at genetics.utah.edu]
>> Gesendet: Freitag, 22. September 2017 22:04
>> An: Keilwagen, Jens
>> Cc: Maker Mailing List
>> Betreff: Re: MAKER
>> 
>> MAKER won’t produce est2genome results for est_gff. This is partially
>> because est2genome results are only used for training gene predictors.
>> So you are essentially just getting protein2genome results from your
>> runs. Once you get a gene predictor trained you will see a difference,
>> as it will use the intron/exon structure of alignments as hints to
>> improve gene predictor performance.
>> 
>> —Carson
>> 
>> 
>>> On Sep 21, 2017, at 1:57 AM, Keilwagen, Jens <jens.keilwagen at julius-
>> kuehn.de> wrote:
>>> 
>>> Hi Carson,
>>> 
>>> I have tried the proposed options for a small example (yeast).
>>> 
>>> I had
>>> - proteins (fasta) from another yeast and
>>> - transcript annotation (gff) from cufflinks and StringTie
>>> 
>>> I'd like to compare the maker results for
>>> - proteins and StringTie
>>> Vs.
>>> - proteins and cufflinks
>>> 
>>> I used the default options, except:
>>> genome=<genome fasta>
>>> 
>>> protein=<protein fasta>
>>> est_gff=<transcript gff>
>>> 
>>> est2genome=1
>>> protein2genome=1
>>> 
>>> (An example is attached.)
>>> 
>>> Then I ran maker:
>>> 
>>> maker -RM_off -c 24
>>> find . -type f -name *.gff -exec cat {} + | grep maker >
>>> filtered-maker-prediction.gff
>>> 
>>> (The run seems to be okay. There were no FAILED, ... in the log. Cf.
>>> attachment)
>>> 
>>> Each maker run was started in a separate subdirectory.
>>> However, I realized that both maker runs yielded almost the same
>> result (just one minor edit). This made me curious.
>>> As far as I understood the files, I received the (filtered?)
>> exonerate predictions for the proteins (from the other yeast). Is this
>> correct? Why did I not receive any predictions (purely) based on the
>> RNA-seq data? Did I something wrong?
>>> 
>>> I'm looking forward to your reply.
>>> 
>>> Best regards, Jens
>>> 
>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Carson Hinton Holt [mailto:carson.holt at genetics.utah.edu]
>>>> Gesendet: Dienstag, 19. September 2017 23:37
>>>> An: Keilwagen, Jens
>>>> Betreff: Re: MAKER
>>>> 
>>>> MAKER cannot use the BAM directly, but you can use something like
>>>> stringtie or trinity to assemble a transcript fasta that can be
>> given
>>>> to the est= option.
>>>> 
>>>> Ab initio gene prediction is only enabled if you specify an hmm or
>>>> species file to use.  If all you want is homology based annotation,
>>>> you can try the est2genome and protein2genome options. Note the
>> final
>>>> models may be partial if the alignments do not cover the gene end to
>>>> end.
>>>> 
>>>> —Carson
>>>> 
>>>> 
>>>> 
>>>>> On Sep 18, 2017, at 4:02 AM, Keilwagen, Jens
>> <jens.keilwagen at julius-
>>>> kuehn.de> wrote:
>>>>> 
>>>>> Hi Carson,
>>>>> 
>>>>> thanks a lot for your last email that .
>>>>> 
>>>>> I was asked to do homology-based gene prediction using RNA-seq and
>>>> Maker was proposed as one option.
>>>>> Hence I'd like to ask how to do that in the best possible way.
>>>>> I have mapped RNA-seq data (SAM/BAM) and a fasta of proteins from a
>>>> related species. How can I integrate the RNA-seq data?
>>>>> 
>>>>> Is it possible to deactivate ab-initio gene prediction by Augustus
>>>>> or
>>>> SNAP?
>>>>> 
>>>>> Thanks a lot in advance.
>>>>> 
>>>>> Bets regards, Jens
>>>>> 
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: Carson Holt [mailto:carson.holt at genetics.utah.edu]
>>>>>> Gesendet: Donnerstag, 18. Februar 2016 19:03
>>>>>> An: Keilwagen, Jens
>>>>>> Cc: Mark Yandell
>>>>>> Betreff: Re: MAKER
>>>>>> 
>>>>>> GeMoMa sounds like an interesting tool.  If it produces GFF3, you
>>>>>> could give the GFF3 results to the pred_gff= option in MAKER
>> (comma
>>>>>> separated lists accepted). The GFF3 file of predictions must be in
>>>>>> the same coordinate space as the assembly being annotated (genome=
>>>> option).
>>>>>> Whatever you give to pred_gff will be treated as a raw predictions
>>>> by
>>>>>> MAKER and will only be accepted as a final model if there are
>>>>>> evidence alignments (protein/EST) that support the model, and if
>>>>>> there are multiple alternate models at the same locus, only the
>>>> model
>>>>>> that is best supported by the protein/transcript evidence is kept.
>>>>>> 
>>>>>> You can also set the keep_preds=1 option when using pred_gff. This
>>>>>> will cause even raw predictions with no evidence support to be
>>>> maintained.
>>>>>> In the event of multiple models with no evidence support, the
>> model
>>>>>> best matching the consensus of alternate models will be
>> maintained.
>>>>>> 
>>>>>> Alternatively you can use the model_gff= options (comma separated
>>>>>> list
>>>>>> ok) to input the GFF3 file.  model_gff features are given higher
>>>>>> confidence than pred_gff. At least one model will always be kept
>>>>>> regardless of evidence support (same rules as pred_gff selection
>>>>>> for which model to keep when there are multiple). But model_gff
>>>>>> will
>>>> also
>>>>>> affect how evidence clusters are determined compared to pred_gff
>>>>>> (model_gff features are allowed to merge bridging evidence
>>>> clusters).
>>>>>> MAKER will also go to extra lengths to pull forward existing names
>>>>>> and other data in the GFF3 for model_gff features.
>>>>>> 
>>>>>> If you do not have GFF3 files in the right coordinate space, but
>> do
>>>>>> have protein fasta or transcript fasta for the GeMoMa predictions,
>>>>>> you can supply these to the protein= and transcript= options in
>>>> MAKER
>>>>>> together with est2genome=1 or protein2genome=1. This will cause
>>>> MAKER
>>>>>> to place the models using exonerate. You would probably also need
>>>>>> to add est_forward=1 to the control files to have MAKER try and
>>>>>> derive model names from the name of evidence alignments they were
>>>>>> derived from if you go this route.
>>>>>> 
>>>>>> You can also try treating the GFF3 predictions as hints to
>>>>>> traditional ab initio gene finders like SNAP or Augustus by giving
>>>>>> them to the est_gff= or protein_gff= options (i.e. make GeMoMa
>>>>>> predictions inform the behavior of predictors like SNAP and
>>>>>> Augustus). Might be interesting. You would have to alter results
>> to
>>>>>> be match/match_part
>>>>>> GFF3 features to give them to the est_gff or protein_gff options.
>>>>>> 
>>>>>> Let me know if you have any more questions, and I’ll do my best to
>>>>>> help.
>>>>>> 
>>>>>> Thanks,
>>>>>> Carson
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 18, 2016, at 10:22 AM, Mark Yandell
>>>>>> <myandell at genetics.utah.edu> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Mark Yandell
>>>>>>> Professor of Human Genetics
>>>>>>> H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR
>>>>>>> Center for Genetic Discovery Eccles Institute of Human Genetics
>>>>>>> University of Utah
>>>>>>> 15 North 2030 East, Room 2100
>>>>>>> Salt Lake City, UT 84112-5330
>>>>>>> ph:801-587-7707
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 2/18/16, 8:34 AM, "Keilwagen, Jens"
>>>>>>> <jens.keilwagen at jki.bund.de>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Dear Prof. Yandell,
>>>>>>>> 
>>>>>>>> we have published a homology-based gene prediction program
>> today:
>>>>>>>> 
>> https://nar.oxfordjournals.org/content/early/2016/02/17/nar.gkw09
>>>>>>>> 2 and I'd like to ask how we can use MAKER to combine
>> predictions
>>>>>>>> of GeMoMa using different reference organisms, i.e. we try to
>>>>>>>> predict the genes of an target organism (e.g. wheat) using the
>>>>>>>> annotated genes of other reference organisms (e.g. grasses).
>>>>>>>> GeMoMa returns
>>>>>> for
>>>>>>>> each reference organism a GFF with the predicted gene models in
>>>> the
>>>>>> target organism.
>>>>>>>> 
>>>>>>>> It would be great if you or someone from your team could give us
>>>>>> some
>>>>>>>> hints or point us to correct paragraph in the documentation.
>>>>>>>> 
>>>>>>>> Thanks a lot and best regards, Jens
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> 
>>>>>>>> Dr. Jens Keilwagen
>>>>>>>> 
>>>>>>>> Julius Kühn-Institut (JKI) - Federal Research Centre for
>>>> Cultivated
>>>>>>>> Plants
>>>>>>>> 	Institute for Biosafety in Plant Biotechnology
>>>>>>>> 
>>>>>>>> Erwin-Baur-Straße 27
>>>>>>>> 06484 Quedlinburg
>>>>>>>> Germany
>>>>>>>> 
>>>>>>>> Phone: ++49 (0)3946 47 510
>>>>>>>> EMail: jens.keilwagen at jki.bund.de
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>> <maker_opts.ctl><slurm-278767.out>
> 



More information about the maker-devel mailing list