[maker-devel] Database disk image is malformed error
Tim Fallon
tfallon at mit.edu
Thu Jun 22 22:33:59 MDT 2017
Hi Carson,
Thanks for the tip! The issue turned out that I needed using the “-l” parameter for gff3_merge, to automatically rename the IDs when merging them, and also to pass the appropriate evidence in the merged GFF using the "Re-annotation Using MAKER Derived GFF3” parameters. I was using the more general parameters down below (protein_gff , est_gff etc). Seems to be working now, though I am still getting the hang of how to fix up misbehaving gene models.
All the best,
-Tim
> On Jun 23, 2017, at 12:15 AM, Carson Holt <carsonhh at gmail.com> wrote:
>
> Don’t use the GFF3 as input to the second stage. Use the original work directory, and just modify and parameters in the control file. MAKER will reuse old results and only delete things that require rerun. Using the GFF3 as input is just a way to reuse MAKER data when the work directory is no longer available, and in most cases you will only pass in the genes (and not the evidence in the GFF3).
>
> Thanks,
> Carson
>
>
>
>> On Jun 16, 2017, at 9:07 AM, Tim Fallon <tfallon at mit.edu <mailto:tfallon at mit.edu>> wrote:
>>
>> Hi there,
>>
>> I’ve been running MAKER in a 2 stage way using MPI, to annotate a de novo insect genome. By two stage, I mean for stage 1 I have a lot of independent folders / maker runs (e.g. individuals reference insect proteomes passed as FASTA with protein2genome=1), and then for stage 2 in a separate folder I am concatenating all that evidence from Stage 1 (using gff3_merge -o) and passing it as GFF parameters.
>>
>> Stage 2 has been crashing. It takes a very long time to setup the SQLite DB from the (~24 hours, with 39 MPI CPUs), and then once it is all loaded it works for a couple seconds then crashes with things like this:
>>
>> "DBD::SQLite::db selectcol_arrayref failed: database disk image is malformed at /lab/solexa_weng/testtube/maker_3.00_beta/bin/../lib/GFFDB.pm line 525.”
>>
>> I am passing a lot of evidence to Stage 2, probably more than people typically pass (the GFFs together are 44GB, whereas the resulting *.db file is 95G).
>>
>> Have you seen this error before? I’m thinking it could be a couple possibilities:
>> 1) Running up against SQLite size / concurrency constraints where the .db ends up being malformed due to MPI / passing too much evidence. Solution -> Load GFFs without MPI, or load less evidence.
>> 2) GFFs are malformed (they pass validation with GT). Solution -> Remove the malformed GFF evidence, although I haven’t been able to track any malformed GFFs down.
>> 3) Identifiers in the GFF that are unique when in a single file, become non-unique. Solution -> Manually rename IDs in passed GFF files to be unique.
>>
>> Thoughts?
>>
>> All the best,
>> -Tim
>>
>> Timothy R. Fallon
>> PhD candidate
>> Laboratory of Jing-Ke Weng
>> Department of Biology
>> MIT
>>
>> tfallon at mit.edu <mailto:tfallon at mit.edu>
>> _______________________________________________
>> maker-devel mailing list
>> maker-devel at box290.bluehost.com <mailto:maker-devel at box290.bluehost.com>
>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT
tfallon at mit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170623/28f3c253/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1849 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20170623/28f3c253/attachment-0003.p7s>
More information about the maker-devel
mailing list