[maker-devel] MAKER

Fri Sep 22 14:04:10 MDT 2017

MAKER won’t produce est2genome results for est_gff. This is partially because est2genome results are only used for training gene predictors. So you are essentially just getting protein2genome results from your runs. Once you get a gene predictor trained you will see a difference, as it will use the intron/exon structure of alignments as hints to improve gene predictor performance.

—Carson

> On Sep 21, 2017, at 1:57 AM, Keilwagen, Jens <jens.keilwagen at julius-kuehn.de> wrote:
> 
> Hi Carson,
> 
> I have tried the proposed options for a small example (yeast).
> 
> I had 
> - proteins (fasta) from another yeast and 
> - transcript annotation (gff) from cufflinks and StringTie
> 
> I'd like to compare the maker results for 
> - proteins and StringTie
> Vs.
> - proteins and cufflinks
> 
> I used the default options, except:
> genome=<genome fasta>
> 
> protein=<protein fasta>
> est_gff=<transcript gff>
> 
> est2genome=1
> protein2genome=1
> 
> (An example is attached.)
> 
> Then I ran maker:
> 
> maker -RM_off -c 24
> find . -type f -name *.gff -exec cat {} + | grep maker > filtered-maker-prediction.gff
> 
> (The run seems to be okay. There were no FAILED, ... in the log. Cf. attachment)
> 
> Each maker run was started in a separate subdirectory.
> However, I realized that both maker runs yielded almost the same result (just one minor edit). This made me curious. 
> As far as I understood the files, I received the (filtered?) exonerate predictions for the proteins (from the other yeast). Is this correct? Why did I not receive any predictions (purely) based on the RNA-seq data? Did I something wrong?
> 
> I'm looking forward to your reply.
> 
> Best regards, Jens
> 
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Carson Hinton Holt [mailto:carson.holt at genetics.utah.edu]
>> Gesendet: Dienstag, 19. September 2017 23:37
>> An: Keilwagen, Jens
>> Betreff: Re: MAKER
>> 
>> MAKER cannot use the BAM directly, but you can use something like
>> stringtie or trinity to assemble a transcript fasta that can be given
>> to the est= option.
>> 
>> Ab initio gene prediction is only enabled if you specify an hmm or
>> species file to use.  If all you want is homology based annotation, you
>> can try the est2genome and protein2genome options. Note the final
>> models may be partial if the alignments do not cover the gene end to
>> end.
>> 
>> —Carson
>> 
>> 
>> 
>>> On Sep 18, 2017, at 4:02 AM, Keilwagen, Jens <jens.keilwagen at julius-
>> kuehn.de> wrote:
>>> 
>>> Hi Carson,
>>> 
>>> thanks a lot for your last email that .
>>> 
>>> I was asked to do homology-based gene prediction using RNA-seq and
>> Maker was proposed as one option.
>>> Hence I'd like to ask how to do that in the best possible way.
>>> I have mapped RNA-seq data (SAM/BAM) and a fasta of proteins from a
>> related species. How can I integrate the RNA-seq data?
>>> 
>>> Is it possible to deactivate ab-initio gene prediction by Augustus or
>> SNAP?
>>> 
>>> Thanks a lot in advance.
>>> 
>>> Bets regards, Jens
>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Carson Holt [mailto:carson.holt at genetics.utah.edu]
>>>> Gesendet: Donnerstag, 18. Februar 2016 19:03
>>>> An: Keilwagen, Jens
>>>> Cc: Mark Yandell
>>>> Betreff: Re: MAKER
>>>> 
>>>> GeMoMa sounds like an interesting tool.  If it produces GFF3, you
>>>> could give the GFF3 results to the pred_gff= option in MAKER (comma
>>>> separated lists accepted). The GFF3 file of predictions must be in
>>>> the same coordinate space as the assembly being annotated (genome=
>> option).
>>>> Whatever you give to pred_gff will be treated as a raw predictions
>> by
>>>> MAKER and will only be accepted as a final model if there are
>>>> evidence alignments (protein/EST) that support the model, and if
>>>> there are multiple alternate models at the same locus, only the
>> model
>>>> that is best supported by the protein/transcript evidence is kept.
>>>> 
>>>> You can also set the keep_preds=1 option when using pred_gff. This
>>>> will cause even raw predictions with no evidence support to be
>> maintained.
>>>> In the event of multiple models with no evidence support, the model
>>>> best matching the consensus of alternate models will be maintained.
>>>> 
>>>> Alternatively you can use the model_gff= options (comma separated
>>>> list
>>>> ok) to input the GFF3 file.  model_gff features are given higher
>>>> confidence than pred_gff. At least one model will always be kept
>>>> regardless of evidence support (same rules as pred_gff selection for
>>>> which model to keep when there are multiple). But model_gff will
>> also
>>>> affect how evidence clusters are determined compared to pred_gff
>>>> (model_gff features are allowed to merge bridging evidence
>> clusters).
>>>> MAKER will also go to extra lengths to pull forward existing names
>>>> and other data in the GFF3 for model_gff features.
>>>> 
>>>> If you do not have GFF3 files in the right coordinate space, but do
>>>> have protein fasta or transcript fasta for the GeMoMa predictions,
>>>> you can supply these to the protein= and transcript= options in
>> MAKER
>>>> together with est2genome=1 or protein2genome=1. This will cause
>> MAKER
>>>> to place the models using exonerate. You would probably also need to
>>>> add est_forward=1 to the control files to have MAKER try and derive
>>>> model names from the name of evidence alignments they were derived
>>>> from if you go this route.
>>>> 
>>>> You can also try treating the GFF3 predictions as hints to
>>>> traditional ab initio gene finders like SNAP or Augustus by giving
>>>> them to the est_gff= or protein_gff= options (i.e. make GeMoMa
>>>> predictions inform the behavior of predictors like SNAP and
>>>> Augustus). Might be interesting. You would have to alter results to
>>>> be match/match_part
>>>> GFF3 features to give them to the est_gff or protein_gff options.
>>>> 
>>>> Let me know if you have any more questions, and I’ll do my best to
>>>> help.
>>>> 
>>>> Thanks,
>>>> Carson
>>>> 
>>>> 
>>>> 
>>>>> On Feb 18, 2016, at 10:22 AM, Mark Yandell
>>>> <myandell at genetics.utah.edu> wrote:
>>>>> 
>>>>> 
>>>>> Mark Yandell
>>>>> Professor of Human Genetics
>>>>> H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR
>>>>> Center for Genetic Discovery Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ph:801-587-7707
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 2/18/16, 8:34 AM, "Keilwagen, Jens" <jens.keilwagen at jki.bund.de>
>>>> wrote:
>>>>> 
>>>>>> Dear Prof. Yandell,
>>>>>> 
>>>>>> we have published a homology-based gene prediction program today:
>>>>>> https://nar.oxfordjournals.org/content/early/2016/02/17/nar.gkw092
>>>>>> and I'd like to ask how we can use MAKER to combine predictions of
>>>>>> GeMoMa using different reference organisms, i.e. we try to predict
>>>>>> the genes of an target organism (e.g. wheat) using the annotated
>>>>>> genes of other reference organisms (e.g. grasses). GeMoMa returns
>>>> for
>>>>>> each reference organism a GFF with the predicted gene models in
>> the
>>>> target organism.
>>>>>> 
>>>>>> It would be great if you or someone from your team could give us
>>>> some
>>>>>> hints or point us to correct paragraph in the documentation.
>>>>>> 
>>>>>> Thanks a lot and best regards, Jens
>>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> Dr. Jens Keilwagen
>>>>>> 
>>>>>> Julius Kühn-Institut (JKI) - Federal Research Centre for
>> Cultivated
>>>>>> Plants
>>>>>> 	Institute for Biosafety in Plant Biotechnology
>>>>>> 
>>>>>> Erwin-Baur-Straße 27
>>>>>> 06484 Quedlinburg
>>>>>> Germany
>>>>>> 
>>>>>> Phone: ++49 (0)3946 47 510
>>>>>> EMail: jens.keilwagen at jki.bund.de
>>>>>> 
>>>>>> 
>>>>> 
>>> 
> 
> <maker_opts.ctl><slurm-278767.out>