[maker-devel] Mapping gene names

Carson Holt carsonhh at gmail.com
Wed May 14 18:19:43 MDT 2014


That should be fixed in the current download?  It came up on the mailing
list a couple of weeks ago.  I'll check.

--Carson


From:  Shaun Jackman <sjackman at gmail.com>
Reply-To:  Shaun Jackman <sjackman at gmail.com>
Date:  Wednesday, May 14, 2014 at 6:06 PM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

Hi, Carson. I used other_gff to pass the following four-line GFF file of
Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre.
See below.

Input:
##gff-version 3
200408_86    barrnap:0.4    rRNA    2171785    2173036    .    +    .
Name=12S_rRNA;product=12S ribosomal RNA
200408_86    barrnap:0.4    rRNA    3665772    3666686    .    -    .
Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57
percent of the 16S ribosomal RNA
200408_86    barrnap:0.4    rRNA    3826637    3827887    .    -    .
Name=12S_rRNA;product=12S ribosomal RNA
200408_86    barrnap:0.4    rRNA    4355857    4357119    .    +    .
Name=12S_rRNA;product=12S ribosomal RNA
Output:
###
ARRAY(0x7feceb928780)
###
ARRAY(0x7feceaa548a0)
###
ARRAY(0x7feceeb01c60)
###
ARRAY(0x7fecedf6fef8)
###
Cheers,
Shaun


http://sjackman.ca


On 14 May 2014 14:18, Carson Holt <carsonhh at gmail.com> wrote:
> Thanks.  Looks interesting. Also since output is already GFF3, you could
> probably just use it with gff passthrough.  It doesn't appear to support
> eukaryotes though.
> 
> --Carson
> 
> 
> Sent from my iPhone
> 
> On May 14, 2014, at 3:07 PM, Shaun Jackman <sjackman at gmail.com> wrote:
> 
>> Hi, Carson. Perhaps MAKER could integrate Barrnap
>> <http://www.vicbioinformatics.com/software.barrnap.shtml>  to predict rRNA.
>> 
>> Cheers,
>> Shaun
>> 
>> 
>> On 4 March 2014 18:33, Carson Holt <carsonhh at gmail.com> wrote:
>>> Trying to call non-coding RNA from ESTs or even sequence homology is
>>> extremely messy (non-trivial problem in most organisms with high false
>>> positive rate), so MAKER for the most part doesn’t even try to do that.  It
>>> focuses only on the coding genes.  You can now use tRNAscan and snoscan in
>>> the newest version for some non-coding RNA support (those features were only
>>> added a couple of months ago).  So just like other prediction tools (snap,
>>> augustus etc.), the primary focus has always been the coding genes.  We’ve
>>> only started adding non-coding RNA support recently for iPlant, so it’s
>>> still relatively immature.
>>> 
>>> Thanks,
>>> Carson
>>> 
>>> 
>>> From:  Shaun Jackman <sjackman at gmail.com>
>>> Reply-To:  Shaun Jackman <sjackman at gmail.com>
>>> Date:  Tuesday, March 4, 2014 at 7:10 PM
>>> 
>>> To:  Carson Holt <carsonhh at gmail.com>
>>> Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>> Subject:  Re: [maker-devel] Mapping gene names
>>> 
>>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for
>>> the tip.
>>> 
>>> The rRNA genes that are found with est2genome have the feature type set to
>>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR
>>> features. Ideally the feature type would be set to rRNA or tRNA as
>>> appropriate, and would omit the UTR and CDS features. Is that a feature that
>>> you would be interested in adding to MAKER? The rRNA gene names all start
>>> with “rrn” and the tRNA gene names with “trn”, as is standard, so
>>> determining the appropriate type should be straight forward.
>>> 
>>> Thanks again for your help with this. Cheers,
>>> Shaun
>>> 
>>> 
>>> 
>>> On 27 February 2014 17:13, Carson Holt <carsonhh at gmail.com> wrote:
>>>> Set single_exon=1, and the minimum size to a smaller value.  I think it's
>>>> set to 250 right now.  Also est2genome is looking for ORF, so if there is
>>>> none (as with tRNAs) they probably won't get picked up.
>>>> 
>>>> --Carson 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman <sjackman at gmail.com> wrote:
>>>> 
>>>>> Sorry, ignore my previous question. est_forward also carries forward the
>>>>> names of protein evidence and works like a charm. Thank you!
>>>>> 
>>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller
>>>>> rrn4.5 and rrn5 and tRNA genes didn’t make it into the all.gff file. They
>>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect
>>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value
>>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing
>>>>> these hits?  
>>>>> organism_type=prokaryotic
>>>>> est2genome=1
>>>>> protein2genome=1
>>>>> est_forward=1
>>>>> Cheers,
>>>>> Shaun
>>>>> 
>>>>> 
>>>>> 
>>>>> On 27 February 2014 15:17, Shaun Jackman <sjackman at gmail.com> wrote:
>>>>>> Is there a corresponding protein_forward=1 option to map forward protein
>>>>>> names from protein2genome?
>>>>>>  
>>>>>> 
>>>>>> Cheers,
>>>>>> Shaun
>>>>>> 
>>>>>> 
>>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com
>>>>>> <mailto://carsonhh@gmail.com> ) wrote:
>>>>>>  
>>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before
>>>>>>> passing the gff3 to model_gff.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt <carsonhh at gmail.com> wrote:
>>>>>>> 
>>>>>>> What you can do is run it once with just est_forward=1 and
>>>>>>> est2genome/protein2genome set to 1.  Then take those results, pass them
>>>>>>> in as model_gff and use the map_forward option to then filter the
>>>>>>> results based on mRNA score and that would copy names onto new gene
>>>>>>> under the standard MAKER pipeline.  Eventually it’s really supposed to
>>>>>>> go into a separate tool that will map genes onto new assemblies (but
>>>>>>> under the hood the tool will just be calling MAKER with certain
>>>>>>> parameters restricted).  I do this because if people commonly use it
>>>>>>> mixed with things like SNAP I can start to get some very weird
>>>>>>> behaviors. 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> From: Mikael Brandström Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> It seems that this could be a very useful option in those cases where
>>>>>>> you have firm a priori knowledge of the placement of ESTs. However,
>>>>>>> while trying it I note that est_forward implies that the est2genome
>>>>>>> predictor is turned on, implicitly. Is this necessary for this to work?
>>>>>>> I’m after the behavior you describe below where exonerate is made to try
>>>>>>> really hard within a limited region to align an est, but I would not
>>>>>>> like maker to produce est2genome predictions.
>>>>>>> 
>>>>>>> In general, I think this maker_coor and est_forward is a feature set
>>>>>>> that is worthy to be promoted into a documented feature.
>>>>>>> 
>>>>>>> THanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> It will still work without est_forward.  It just works a little
>>>>>>> differently.  Keep in mind this was a hidden feature I used to find
>>>>>>> stubborn or hard to find missing genes after reassembly of a genome.
>>>>>>> 
>>>>>>> If est_forward is provided, MAKER will parse the database to look for
>>>>>>> the maker_coor tags early in the pipeline.  Then it will create a list
>>>>>>> of locations to search, and it will search them even if there are no
>>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result
>>>>>>> first and then polishes it with exonerate).  So maker_coor=chr1 will
>>>>>>> cause MAKER to look for a match using all of chr1 as the input to
>>>>>>> exonerate even when BLAST finds nothing (this is a very very slow
>>>>>>> search, but can help pick up one or two stubborn genes that don’t remap
>>>>>>> well).  To allow this, MAKER gives exonerate looser matching parameters
>>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly
>>>>>>> errors).  The logic here is that given the fact that I already told
>>>>>>> MAKER that with some degree of confidence I expect sequence A to map to
>>>>>>> to location X, it will try its hardest to make it match.
>>>>>>> 
>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm
>>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to
>>>>>>> the region (that BLAST result has the information in its description
>>>>>>> parameter).  MAKER will then ignore seeds completely outside of
>>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get
>>>>>>> the search space for alignment polishing adjusted to match maker_coor
>>>>>>> exactly.  Also match parameters for exonerate will not be relaxed as
>>>>>>> they were with est_forward.
>>>>>>> 
>>>>>>> As you can see the behavior, is slightly different (because it’s an
>>>>>>> accidental feature).
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Mikael Brandström Durling <mikael.durling at slu.se>
>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM
>>>>>>> To: Carson Holt <carsonhh at gmail.com>
>>>>>>> Cc: "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
>>>>>>> Subject: Re: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> That might be a useful and time saving accidental feature. But, reading
>>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as
>>>>>>> well as the configuration option est_forward for this to work. Any
>>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on
>>>>>>> set_forward=1 right?
>>>>>>> 
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> Yes.  That should work as well as an accidental feature.
>>>>>>> 
>>>>>>> --Carson 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandström Durling
>>>>>>> <mikael.durling at slu.se> wrote:
>>>>>>> 
>>>>>>> Can this use of maker_coor be used only to hint about the placement of
>>>>>>> the ests, without affecting the naming of the final genes? Ie if I have
>>>>>>> a database of EST where I have a priori knowledge of their rough
>>>>>>> placement, can this placement be given to maker without providing
>>>>>>> est_forward=1?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Mikael
>>>>>>> 
>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>>>>>>> 
>>>>>>> There is a way.  It’s not a standard option and it’s undocumented, but
>>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do
>>>>>>> just that.  The option won’t already be there so you’ll have to type it
>>>>>>> in.
>>>>>>> 
>>>>>>> There is also a feature designed to work with this option.  If you add
>>>>>>> tags to your fasta headers, those can be used to guide the mapping and
>>>>>>> naming.  For example, gene_id=<some_gene>  will ensure different
>>>>>>> isoforms that share a common gene_id get clustered into the same gene,
>>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular
>>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp
>>>>>>> and just using maker_coor=chr1 will force it to only be mapped against
>>>>>>> chr1.
>>>>>>> 
>>>>>>> This is an undocumented way to remap genes onto new assemblies using
>>>>>>> blast alignments of earlier transcript or protein annotations as a
>>>>>>> guide.
>>>>>>> 
>>>>>>> —Carson
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>>>>>> To: <maker-devel at yandell-lab.org>
>>>>>>> Subject: [maker-devel] Mapping gene names
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I’m annotating a genome using a closely related genome from Genbank,
>>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence
>>>>>>> to annotate my genome. I’ve run Maker, and the annotation seems to have
>>>>>>> worked well. Is it possible to map the names of the genes from the
>>>>>>> related species to my annotation? I see the map_forward option, which
>>>>>>> applies to the model_gff parameter. Is there a similar option for est
>>>>>>> and protein?
>>>>>>> 
>>>>>>> maker_opts.ctl
>>>>>>> est=NC_123456.frn
>>>>>>> protein=NC_123456.faa
>>>>>>> est2genome=1
>>>>>>> protein2genome=1
>>>>>>> Thanks,
>>>>>>> Shaun
>>>>>>> _______________________________________________ maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> <http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> > 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> maker-devel mailing list
>>>>>>> maker-devel at box290.bluehost.com
>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>>>> 
>>> 
>> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140514/4e78efd9/attachment-0003.html>


More information about the maker-devel mailing list