[maker-devel] How to preserve human-friendly IDs when reannotating
Carson Holt
carsonhh at gmail.com
Mon Sep 10 05:01:46 MDT 2012
The map_forward option requires that the pass option for the gene models be
turned on. Otherwise you will have to do some spacial overlap test outside
of MAKER.
If you have a new assembly, you can try mapping the old models onto the new
assembly using the old transcripts as input to the est= and setting
est2genome=1 (nothing else set, i.e no repeat masking etc.). Then there is
an undocumented option that is still a little buggy (hence why it is still
undocumented). Add the line est_forward=1 to your control files. This
tells MAKER to copy names from the ESTs, build the models directly from
their alignment, and to do other things to try and make a 1 to 1 match
across the genome. You will have to manually check that it is 1 to 1 in the
end (as I said still a little buggy and hence undocumented). Use the
resulting file as input to the model_gff option on a separate run with
map_forward=1 for additional reannotation wil more evidence, etc. where you
want to still be able to map names forward.
From: Jeremy Semeiks <jeremy.semeiks at utsw.edu>
Date: Sunday, 9 September, 2012 3:49 PM
To: <maker-devel at yandell-lab.org>
Subject: [maker-devel] How to preserve human-friendly IDs when reannotating
Hi all,
I have sequenced some novel fungal genomes, and I am annotating them with
maker-2.26-beta. The entire project is pretty iterative, in the sense that I
first get some seemingly-sane annotation sets, then analyze and compare the
proteomes biologically, then reannotate when new data comes in or as I learn
more about how maker works. Because I have already attached biological
meaning to some of my proteins, I would like to retain the same
human-friendly IDs across annotations. Eg, if maker suddenly finds 1,000 new
proteins on a reannotation run because I turned on keep_preds, then I don't
want the transcript formerly known as mymold_09652T0 to become
mymold_10698T0 when I run maker_map_ids; I want to keep it named
mymold_09652T0.
So, is there any built-in way to preserve human-friendly IDs, or do I need
to write my own script for this? I have tried setting map_forward=1 and
maker_gff=<the GFF file output by the previous run of maker_map_ids>, but
setting these seems to preserve neither the human-friendly IDs nor even the
original IDs. (Eg, protein "genemark-scaffold353-processed-gene-0.9-mRNA-1"
changed its name to "genemark-scaffold353-processed-gene-0.6-mRNA-1" when
reannotated.) I haven't turned on any of the *_pass options, eg
protein_pass; would this be relevant?
Extra credit question: I am making some mate-pair libraries for these fungi;
when I re-assemble, that will completely change my scaffold names. Is there
any easy way to preserve human-friendly transcript names in this case? As
with the above simpler case, I think it would be pretty easy to transfer 90%
of the names just by doing an all-vs-all blastp between two annotation sets
and fishing out the best hits, but the remaining 10% might be a headache.
Thanks,
Jeremy
Grad student, Grishin lab
UT Southwestern, Dallas TX
510.385.8959
_______________________________________________ maker-devel mailing list
maker-devel at box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120910/2d6202fc/attachment-0003.html>
More information about the maker-devel
mailing list