[maker-devel] Mapping gene names
Shaun Jackman
sjackman at gmail.com
Wed May 14 15:07:52 MDT 2014
Hi, Carson. Perhaps MAKER could integrate
Barrnap<http://www.vicbioinformatics.com/software.barrnap.shtml>to
predict rRNA.
Cheers,
Shaun
On 4 March 2014 18:33, Carson Holt <carsonhh at gmail.com> wrote:
> Trying to call non-coding RNA from ESTs or even sequence homology is
> extremely messy (non-trivial problem in most organisms with high false
> positive rate), so MAKER for the most part doesn’t even try to do that. It
> focuses only on the coding genes. You can now use tRNAscan and snoscan in
> the newest version for some non-coding RNA support (those features were
> only added a couple of months ago). So just like other prediction tools
> (snap, augustus etc.), the primary focus has always been the coding genes.
> We’ve only started adding non-coding RNA support recently for iPlant, so
> it’s still relatively immature.
>
> Thanks,
> Carson
>
>
> From: Shaun Jackman <sjackman at gmail.com>
> Reply-To: Shaun Jackman <sjackman at gmail.com>
> Date: Tuesday, March 4, 2014 at 7:10 PM
>
> To: Carson Holt <carsonhh at gmail.com>
> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
> Subject: Re: [maker-devel] Mapping gene names
>
> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks
> for the tip.
>
> The rRNA genes that are found with est2genome have the feature type set to
> *mRNA* and have corresponding *five_prime_UTR*, *CDS* and
> *three_prime_UTR* features. Ideally the feature type would be set to
> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features.
> Is that a feature that you would be interested in adding to MAKER? The rRNA
> gene names all start with “rrn” and the tRNA gene names with “trn”, as is
> standard, so determining the appropriate type should be straight forward.
>
> Thanks again for your help with this. Cheers,
> Shaun
>
>
> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>
>> Set single_exon=1, and the minimum size to a smaller value. I think it's
>> set to 250 right now. Also est2genome is looking for ORF, so if there is
>> none (as with tRNAs) they probably won't get picked up.
>>
>> --Carson
>>
>> Sent from my iPhone
>>
>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>
>> Sorry, ignore my previous question. est_forward also carries forward the
>> names of protein evidence and works like a charm. Thank you!
>>
>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>> rrn4.5 and rrn5 and tRNA genes didn’t make it into the all.gff file. They
>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>> these hits?
>>
>> organism_type=prokaryotic
>> est2genome=1
>> protein2genome=1
>> est_forward=1
>>
>> Cheers,
>> Shaun
>>
>>
>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>
>>> Is there a corresponding protein_forward=1 option to map forward protein
>>> names from protein2genome?
>>>
>>> Cheers,
>>> Shaun
>>>
>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com<//carsonhh at gmail.com>)
>>> wrote:
>>>
>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>> passing the gff3 to model_gff.
>>>
>>> --Carson
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>
>>> What you can do is run it once with just est_forward=1 and
>>> est2genome/protein2genome set to 1. Then take those results, pass them in
>>> as model_gff and use the map_forward option to then filter the results
>>> based on mRNA score and that would copy names onto new gene under the
>>> standard MAKER pipeline. Eventually it’s really supposed to go into a
>>> separate tool that will map genes onto new assemblies (but under the hood
>>> the tool will just be calling MAKER with certain parameters restricted). I
>>> do this because if people commonly use it mixed with things like SNAP I can
>>> start to get some very weird behaviors.
>>>
>>> Thanks,
>>> Carson
>>>
>>> From: Mikael Brandström Durling <mikael.durling at slu.se>
>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] Mapping gene names
>>>
>>> It seems that this could be a very useful option in those cases where
>>> you have firm a priori knowledge of the placement of ESTs. However, while
>>> trying it I note that est_forward implies that the est2genome predictor is
>>> turned on, implicitly. Is this necessary for this to work? I’m after the
>>> behavior you describe below where exonerate is made to try really hard
>>> within a limited region to align an est, but I would not like maker to
>>> produce est2genome predictions.
>>>
>>> In general, I think this maker_coor and est_forward is a feature set
>>> that is worthy to be promoted into a documented feature.
>>>
>>> THanks,
>>> Mikael
>>>
>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>
>>> It will still work without est_forward. It just works a little
>>> differently. Keep in mind this was a hidden feature I used to find
>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>
>>> If est_forward is provided, MAKER will parse the database to look for
>>> the maker_coor tags early in the pipeline. Then it will create a list of
>>> locations to search, and it will search them even if there are no BLAST
>>> results to seed the search (normally MAKER gets a BLAST result first and
>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to
>>> look for a match using all of chr1 as the input to exonerate even when
>>> BLAST finds nothing (this is a very very slow search, but can help pick up
>>> one or two stubborn genes that don’t remap well). To allow this, MAKER
>>> gives exonerate looser matching parameters (i.e. allows for single base
>>> pair introns perhaps caused by assembly errors). The logic here is that
>>> given the fact that I already told MAKER that with some degree of
>>> confidence I expect sequence A to map to to location X, it will try its
>>> hardest to make it match.
>>>
>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>> at line 1563, but only after a BLAST alignment has already seeded it to the
>>> region (that BLAST result has the information in its description
>>> parameter). MAKER will then ignore seeds completely outside of maker_coor.
>>> In addition any BLAST seeds that overlap maker_coor will get the search
>>> space for alignment polishing adjusted to match maker_coor exactly. Also
>>> match parameters for exonerate will not be relaxed as they were with
>>> est_forward.
>>>
>>> As you can see the behavior, is slightly different (because it’s an
>>> accidental feature).
>>>
>>> Thanks,
>>> Carson
>>>
>>>
>>>
>>> From: Mikael Brandström Durling <mikael.durling at slu.se>
>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>> To: Carson Holt <carsonhh at gmail.com>
>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>> Subject: Re: [maker-devel] Mapping gene names
>>>
>>> That might be a useful and time saving accidental feature. But, reading
>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>> well as the configuration option est_forward for this to work. Any
>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1
>>> right?
>>>
>>> Mikael
>>>
>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>
>>> Yes. That should work as well as an accidental feature.
>>>
>>> --Carson
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandström Durling <
>>> mikael.durling at slu.se> wrote:
>>>
>>> Can this use of maker_coor be used only to hint about the placement of
>>> the ests, without affecting the naming of the final genes? Ie if I have a
>>> database of EST where I have a priori knowledge of their rough placement,
>>> can this placement be given to maker without providing est_forward=1?
>>>
>>> Thanks,
>>> Mikael
>>>
>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>
>>> There is a way. It’s not a standard option and it’s undocumented, but
>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just
>>> that. The option won’t already be there so you’ll have to type it in.
>>>
>>> There is also a feature designed to work with this option. If you add
>>> tags to your fasta headers, those can be used to guide the mapping and
>>> naming. For example, gene_id=<some_gene> will ensure different isoforms
>>> that share a common gene_id get clustered into the same gene,
>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and
>>> just using maker_coor=chr1 will force it to only be mapped against chr1.
>>>
>>> This is an undocumented way to remap genes onto new assemblies using
>>> blast alignments of earlier transcript or protein annotations as a guide.
>>>
>>> —Carson
>>>
>>>
>>>
>>>
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>>
>>> Hi,
>>>
>>> I’m annotating a genome using a closely related genome from Genbank,
>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to
>>> annotate my genome. I’ve run Maker, and the annotation seems to have worked
>>> well. Is it possible to map the names of the genes from the related species
>>> to my annotation? I see the *map_forward* option, which applies to the
>>> *model_gff* parameter. Is there a similar option for *est* and *protein*
>>> ?
>>>
>>> *maker_opts.ctl*
>>>
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>>
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140514/9d6db8e7/attachment-0003.html>
More information about the maker-devel
mailing list