[maker-devel] MAKER
Carson Hinton Holt
carson.holt at genetics.utah.edu
Fri Sep 22 14:04:10 MDT 2017
MAKER won’t produce est2genome results for est_gff. This is partially because est2genome results are only used for training gene predictors. So you are essentially just getting protein2genome results from your runs. Once you get a gene predictor trained you will see a difference, as it will use the intron/exon structure of alignments as hints to improve gene predictor performance.
—Carson
> On Sep 21, 2017, at 1:57 AM, Keilwagen, Jens <jens.keilwagen at julius-kuehn.de> wrote:
>
> Hi Carson,
>
> I have tried the proposed options for a small example (yeast).
>
> I had
> - proteins (fasta) from another yeast and
> - transcript annotation (gff) from cufflinks and StringTie
>
> I'd like to compare the maker results for
> - proteins and StringTie
> Vs.
> - proteins and cufflinks
>
> I used the default options, except:
> genome=<genome fasta>
>
> protein=<protein fasta>
> est_gff=<transcript gff>
>
> est2genome=1
> protein2genome=1
>
> (An example is attached.)
>
> Then I ran maker:
>
> maker -RM_off -c 24
> find . -type f -name *.gff -exec cat {} + | grep maker > filtered-maker-prediction.gff
>
> (The run seems to be okay. There were no FAILED, ... in the log. Cf. attachment)
>
> Each maker run was started in a separate subdirectory.
> However, I realized that both maker runs yielded almost the same result (just one minor edit). This made me curious.
> As far as I understood the files, I received the (filtered?) exonerate predictions for the proteins (from the other yeast). Is this correct? Why did I not receive any predictions (purely) based on the RNA-seq data? Did I something wrong?
>
> I'm looking forward to your reply.
>
> Best regards, Jens
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Carson Hinton Holt [mailto:carson.holt at genetics.utah.edu]
>> Gesendet: Dienstag, 19. September 2017 23:37
>> An: Keilwagen, Jens
>> Betreff: Re: MAKER
>>
>> MAKER cannot use the BAM directly, but you can use something like
>> stringtie or trinity to assemble a transcript fasta that can be given
>> to the est= option.
>>
>> Ab initio gene prediction is only enabled if you specify an hmm or
>> species file to use. If all you want is homology based annotation, you
>> can try the est2genome and protein2genome options. Note the final
>> models may be partial if the alignments do not cover the gene end to
>> end.
>>
>> —Carson
>>
>>
>>
>>> On Sep 18, 2017, at 4:02 AM, Keilwagen, Jens <jens.keilwagen at julius-
>> kuehn.de> wrote:
>>>
>>> Hi Carson,
>>>
>>> thanks a lot for your last email that .
>>>
>>> I was asked to do homology-based gene prediction using RNA-seq and
>> Maker was proposed as one option.
>>> Hence I'd like to ask how to do that in the best possible way.
>>> I have mapped RNA-seq data (SAM/BAM) and a fasta of proteins from a
>> related species. How can I integrate the RNA-seq data?
>>>
>>> Is it possible to deactivate ab-initio gene prediction by Augustus or
>> SNAP?
>>>
>>> Thanks a lot in advance.
>>>
>>> Bets regards, Jens
>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Carson Holt [mailto:carson.holt at genetics.utah.edu]
>>>> Gesendet: Donnerstag, 18. Februar 2016 19:03
>>>> An: Keilwagen, Jens
>>>> Cc: Mark Yandell
>>>> Betreff: Re: MAKER
>>>>
>>>> GeMoMa sounds like an interesting tool. If it produces GFF3, you
>>>> could give the GFF3 results to the pred_gff= option in MAKER (comma
>>>> separated lists accepted). The GFF3 file of predictions must be in
>>>> the same coordinate space as the assembly being annotated (genome=
>> option).
>>>> Whatever you give to pred_gff will be treated as a raw predictions
>> by
>>>> MAKER and will only be accepted as a final model if there are
>>>> evidence alignments (protein/EST) that support the model, and if
>>>> there are multiple alternate models at the same locus, only the
>> model
>>>> that is best supported by the protein/transcript evidence is kept.
>>>>
>>>> You can also set the keep_preds=1 option when using pred_gff. This
>>>> will cause even raw predictions with no evidence support to be
>> maintained.
>>>> In the event of multiple models with no evidence support, the model
>>>> best matching the consensus of alternate models will be maintained.
>>>>
>>>> Alternatively you can use the model_gff= options (comma separated
>>>> list
>>>> ok) to input the GFF3 file. model_gff features are given higher
>>>> confidence than pred_gff. At least one model will always be kept
>>>> regardless of evidence support (same rules as pred_gff selection for
>>>> which model to keep when there are multiple). But model_gff will
>> also
>>>> affect how evidence clusters are determined compared to pred_gff
>>>> (model_gff features are allowed to merge bridging evidence
>> clusters).
>>>> MAKER will also go to extra lengths to pull forward existing names
>>>> and other data in the GFF3 for model_gff features.
>>>>
>>>> If you do not have GFF3 files in the right coordinate space, but do
>>>> have protein fasta or transcript fasta for the GeMoMa predictions,
>>>> you can supply these to the protein= and transcript= options in
>> MAKER
>>>> together with est2genome=1 or protein2genome=1. This will cause
>> MAKER
>>>> to place the models using exonerate. You would probably also need to
>>>> add est_forward=1 to the control files to have MAKER try and derive
>>>> model names from the name of evidence alignments they were derived
>>>> from if you go this route.
>>>>
>>>> You can also try treating the GFF3 predictions as hints to
>>>> traditional ab initio gene finders like SNAP or Augustus by giving
>>>> them to the est_gff= or protein_gff= options (i.e. make GeMoMa
>>>> predictions inform the behavior of predictors like SNAP and
>>>> Augustus). Might be interesting. You would have to alter results to
>>>> be match/match_part
>>>> GFF3 features to give them to the est_gff or protein_gff options.
>>>>
>>>> Let me know if you have any more questions, and I’ll do my best to
>>>> help.
>>>>
>>>> Thanks,
>>>> Carson
>>>>
>>>>
>>>>
>>>>> On Feb 18, 2016, at 10:22 AM, Mark Yandell
>>>> <myandell at genetics.utah.edu> wrote:
>>>>>
>>>>>
>>>>> Mark Yandell
>>>>> Professor of Human Genetics
>>>>> H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR
>>>>> Center for Genetic Discovery Eccles Institute of Human Genetics
>>>>> University of Utah
>>>>> 15 North 2030 East, Room 2100
>>>>> Salt Lake City, UT 84112-5330
>>>>> ph:801-587-7707
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/18/16, 8:34 AM, "Keilwagen, Jens" <jens.keilwagen at jki.bund.de>
>>>> wrote:
>>>>>
>>>>>> Dear Prof. Yandell,
>>>>>>
>>>>>> we have published a homology-based gene prediction program today:
>>>>>> https://nar.oxfordjournals.org/content/early/2016/02/17/nar.gkw092
>>>>>> and I'd like to ask how we can use MAKER to combine predictions of
>>>>>> GeMoMa using different reference organisms, i.e. we try to predict
>>>>>> the genes of an target organism (e.g. wheat) using the annotated
>>>>>> genes of other reference organisms (e.g. grasses). GeMoMa returns
>>>> for
>>>>>> each reference organism a GFF with the predicted gene models in
>> the
>>>> target organism.
>>>>>>
>>>>>> It would be great if you or someone from your team could give us
>>>> some
>>>>>> hints or point us to correct paragraph in the documentation.
>>>>>>
>>>>>> Thanks a lot and best regards, Jens
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> Dr. Jens Keilwagen
>>>>>>
>>>>>> Julius Kühn-Institut (JKI) - Federal Research Centre for
>> Cultivated
>>>>>> Plants
>>>>>> Institute for Biosafety in Plant Biotechnology
>>>>>>
>>>>>> Erwin-Baur-Straße 27
>>>>>> 06484 Quedlinburg
>>>>>> Germany
>>>>>>
>>>>>> Phone: ++49 (0)3946 47 510
>>>>>> EMail: jens.keilwagen at jki.bund.de
>>>>>>
>>>>>>
>>>>>
>>>
>
> <maker_opts.ctl><slurm-278767.out>
More information about the maker-devel
mailing list