[maker-devel] Mapping gene names

Carson Holt carsonhh at gmail.com
Wed Feb 26 09:09:14 MST 2014


It will still work without est_forward.  It just works a little differently.
Keep in mind this was a hidden feature I used to find stubborn or hard to
find missing genes after reassembly of a genome.

If est_forward is provided, MAKER will parse the database to look for the
maker_coor tags early in the pipeline.  Then it will create a list of
locations to search, and it will search them even if there are no BLAST
results to seed the search (normally MAKER gets a BLAST result first and
then polishes it with exonerate).  So maker_coor=chr1 will cause MAKER to
look for a match using all of chr1 as the input to exonerate even when BLAST
finds nothing (this is a very very slow search, but can help pick up one or
two stubborn genes that don’t remap well).  To allow this, MAKER gives
exonerate looser matching parameters (i.e. allows for single base pair
introns perhaps caused by assembly errors).  The logic here is that given
the fact that I already told MAKER that with some degree of confidence I
expect sequence A to map to to location X, it will try its hardest to make
it match. 

Without est_forward set, the maker_coor= flag still gets read in GI.pm at
line 1563, but only after a BLAST alignment has already seeded it to the
region (that BLAST result has the information in its description parameter).
MAKER will then ignore seeds completely outside of maker_coor. In addition
any BLAST seeds that overlap maker_coor will get the search space for
alignment polishing adjusted to match maker_coor exactly.  Also match
parameters for exonerate will not be relaxed as they were with est_forward.

As you can see the behavior, is slightly different (because it’s an
accidental feature).

Thanks,
Carson



From:  Mikael Brandström Durling <mikael.durling at slu.se>
Date:  Wednesday, February 26, 2014 at 6:37 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  "maker-devel at yandell-lab.org" <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] Mapping gene names

That might be a useful and time saving accidental feature. But, reading the
code, it seems that I need to supply maker_coor but not gene_id, as well as
the configuration option est_forward for this to work. Any occurrences of
maker_coor in GI.pm seems to be conditioned on set_forward=1 right?

Mikael

26 feb 2014 kl. 14:22 skrev Carson Holt <carsonhh at gmail.com>:

> Yes.  That should work as well as an accidental feature.
> 
> --Carson 
> 
> Sent from my iPhone
> 
> On Feb 26, 2014, at 5:30 AM, Mikael Brandström Durling <mikael.durling at slu.se>
> wrote:
> 
>> Can this use of maker_coor be used only to hint about the placement of the
>> ests, without affecting the naming of the final genes? Ie if I have a
>> database of EST where I have a priori knowledge of their rough placement, can
>> this placement be given to maker without providing est_forward=1?
>> 
>> Thanks,
>> Mikael
>> 
>> 26 feb 2014 kl. 01:58 skrev Carson Holt <carsonhh at gmail.com>:
>> 
>>> There is a way.  It’s not a standard option and it’s undocumented, but if
>>> you add est_forward=1 to the maker_opts.ctl file, then it will do just that.
>>> The option won’t already be there so you’ll have to type it in.
>>> 
>>> There is also a feature designed to work with this option.  If you add tags
>>> to your fasta headers, those can be used to guide the mapping and naming.
>>> For example, gene_id=<some_gene>  will ensure different isoforms that share
>>> a common gene_id get clustered into the same gene, and
>>> maker_coor=chr1:1-10000 in the fasta header will force a particular sequence
>>> to only be mapped against chr1 within the range of 1-10000 bp  and just
>>> using maker_coor=chr1 will force it to only be mapped against chr1.
>>> 
>>> This is an undocumented way to remap genes onto new assemblies using blast
>>> alignments of earlier transcript or protein annotations as a guide.
>>> 
>>> —Carson
>>> 
>>> 
>>> 
>>> 
>>> From: Shaun Jackman <sjackman at gmail.com>
>>> Reply-To: Shaun Jackman <sjackman at gmail.com>
>>> Date: Tuesday, February 25, 2014 at 5:06 PM
>>> To: <maker-devel at yandell-lab.org>
>>> Subject: [maker-devel] Mapping gene names
>>> 
>>> Hi,
>>> 
>>> I’m annotating a genome using a closely related genome from Genbank, using
>>> the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate
>>> my genome. I’ve run Maker, and the annotation seems to have worked well. Is
>>> it possible to map the names of the genes from the related species to my
>>> annotation? I see the map_forward option, which applies to the model_gff
>>> parameter. Is there a similar option for est and protein?
>>> 
>>> maker_opts.ctl
>>> est=NC_123456.frn
>>> protein=NC_123456.faa
>>> est2genome=1
>>> protein2genome=1
>>> Thanks,
>>> Shaun
>>> _______________________________________________ maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>> _______________________________________________
>>> maker-devel mailing list
>>> maker-devel at box290.bluehost.com
>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20140226/4889751f/attachment-0003.html>


More information about the maker-devel mailing list