[maker-devel] How to preserve human-friendly IDs when reannotating

Carson Holt carsonhh at gmail.com
Tue Sep 11 14:33:11 MDT 2012


Which version of MAKER are you running (maker --version)?

--Carson



From:  Jeremy Semeiks <jeremy.semeiks at utsw.edu>
Date:  Monday, 10 September, 2012 11:02 AM
To:  Carson Holt <carsonhh at gmail.com>
Cc:  <maker-devel at yandell-lab.org>
Subject:  Re: [maker-devel] How to preserve human-friendly IDs when
reannotating

OK, thanks. So if I understand correctly, to preserve human-friendly IDs
requires setting just three options: map_forward=1,
maker_gff=<my_human_friendly.gff>, and model_pass=1. (Or instead of the last
two I could equivalently just set model_gff to a GFF containing only
models.)

A couple new issues came up when I tried to run with these options. I
started maker like this:

/usr/bin/time mpiexec -n 10 maker -q < /dev/null > maker.oe 2>&1

1. I get a bunch of messages as follows, but with variable line number:

DBD::SQLite::db do failed: database is locked at
/home/jrs/maker-2.26-beta/bin/../lib/GFFDB.pm line 186.

I saw that this came up in another thread
<https://groups.google.com/forum/?fromgroups=#!topic/maker-devel/TscBgbQfBX4
>, but I'm not sure it was ever resolved, nor whether it will affect my
reannotation results (as I'm not sure what "your GFF3 results will not be
integrated" means). This error did not come up the last time I ran maker for
reannotation with similar options in a different directory. And both my
current directory and my tmp directory are locally mounted, ie not NFS.

2. Both in this run and in previous runs, I get a lot of lines like this,
seemingly at random:

Warning: unable to close filehandle DF properly.


On Mon, Sep 10, 2012 at 6:01 AM, Carson Holt <carsonhh at gmail.com> wrote:
> The map_forward option requires that the pass option for the gene models be
> turned on.  Otherwise you will have to do some spacial overlap test outside of
> MAKER.
> 
> If you have a new assembly, you can try mapping the old models onto the new
> assembly using the old transcripts as input to the est= and setting
> est2genome=1 (nothing else set, i.e no repeat masking etc.).  Then there is an
> undocumented option that is still a little buggy (hence why it is still
> undocumented).  Add the line est_forward=1 to your control files.  This tells
> MAKER to copy names from the ESTs, build the models directly from their
> alignment, and to do other things to try and make a 1 to 1 match across the
> genome.  You will have to manually check that it is 1 to 1 in the end (as I
> said still a little buggy and hence undocumented).  Use the resulting file as
> input to the model_gff option on a separate run with map_forward=1 for
> additional reannotation wil more evidence, etc. where you want to still be
> able to map names forward.
> 
> From:  Jeremy Semeiks <jeremy.semeiks at utsw.edu>
> Date:  Sunday, 9 September, 2012 3:49 PM
> To:  <maker-devel at yandell-lab.org>
> Subject:  [maker-devel] How to preserve human-friendly IDs when reannotating
> 
> Hi all,
> 
> I have sequenced some novel fungal genomes, and I am annotating them with
> maker-2.26-beta. The entire project is pretty iterative, in the sense that I
> first get some seemingly-sane annotation sets, then analyze and compare the
> proteomes biologically, then reannotate when new data comes in or as I learn
> more about how maker works. Because I have already attached biological meaning
> to some of my proteins, I would like to retain the same human-friendly IDs
> across annotations. Eg, if maker suddenly finds 1,000 new proteins on a
> reannotation run because I turned on keep_preds, then I don't want the
> transcript formerly known as mymold_09652T0 to become mymold_10698T0 when I
> run maker_map_ids; I want to keep it named mymold_09652T0.
> 
> So, is there any built-in way to preserve human-friendly IDs, or do I need to
> write my own script for this? I have tried setting map_forward=1 and
> maker_gff=<the GFF file output by the previous run of maker_map_ids>, but
> setting these seems to preserve neither the human-friendly IDs nor even the
> original IDs. (Eg, protein "genemark-scaffold353-processed-gene-0.9-mRNA-1"
> changed its name to "genemark-scaffold353-processed-gene-0.6-mRNA-1" when
> reannotated.) I haven't turned on any of the *_pass options, eg protein_pass;
> would this be relevant?
> 
> Extra credit question: I am making some mate-pair libraries for these fungi;
> when I re-assemble, that will completely change my scaffold names. Is there
> any easy way to preserve human-friendly transcript names in this case? As with
> the above simpler case, I think it would be pretty easy to transfer 90% of the
> names just by doing an all-vs-all blastp between two annotation sets and
> fishing out the best hits, but the remaining 10% might be a headache.
> 
> Thanks,
> Jeremy
> Grad student, Grishin lab
> UT Southwestern, Dallas TX
> 510.385.8959 <tel:510.385.8959>
> _______________________________________________ maker-devel mailing list
> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak
> er-devel_yandell-lab.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20120911/f9fd79b2/attachment-0003.html>


More information about the maker-devel mailing list