[maker-devel] short scaffolds finish, long scaffolds (almost always) fail

Devon O'Rourke devon.orourke at gmail.com
Wed Feb 26 12:36:25 MST 2020


Thanks very much for the reply Carson,
I've attached few files file of the most recently failed run: the shell
script submitted to Slurm, the _opts.ctl file, and the pair of log files
generated from the job. The reason there are a 1a and 1b pair of files is
that I had initially set the number of cpus in the _opts.ctl file to "60",
but then tried re-running it after setting it to "28". Both seem to have
the same result.
I certainly have access to more memory if needed. I'm using a pretty
typical (I think?) cluster that controls jobs with Slurm using a Lustre
file system - it's the main high performance computing center at our
university. I have access to plenty of nodes that contain about 120-150g of
RAM each with between 24-28 cpus each, as well a handful of higher memory
nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a
similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over
32 cpus; if that fails, I could certainly run again with even more memory.
Appreciate your insights; hope the weather in UT is filled with sun or snow
or both.
Devon

On Wed, Feb 26, 2020 at 2:10 PM Carson Holt <carsonhh at gmail.com> wrote:

> If running under MPI, the reason for a failure may be further back in the
> STDERR (failures tend snowball other failures, so the initial cause is
> often way back). If you can capture the STDERR and send it, that would be
> the most informative. If its memory, you can also set all the blast_depth
> parameters in maker_botpts.ctl to a value like 20.
>
> —Carson
>
>
>
> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke <devon.orourke at gmail.com>
> wrote:
>
> Hello,
>
> I apologize for not posting directly to the archived forum but it appears
> that the option to enter new posts is disabled. Perhaps this is by design
> so emails go directly to this address. I hope this is what you are looking
> for.
>
> Thank you for your continued support of Maker and your responses to the
> forum posts. I have been running Maker (V3.01.02-beta) to annotate a
> mammalian genome that consists of 22 chromosome-length scaffolds (between
> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length.
> In my various tests in running Maker, the vast majority of the smaller
> fragments are annotated successfully, but nearly all the large scaffolds
> fail with the same error code when I look at the 'run.log.child.0' file:
> ```
> DIED RANK 0:6:0:0
> DIED COUNT 2
> ```
> (the master 'run.log' file just shows "DIED COUNT 2")
>
> I struggled to find this exact error code anywhere on the forum and was
> hoping you might be able to help me determine where I should start
> troubleshooting. I thought perhaps it was an error concerning memory
> requirements, so I altered the chunk size from the default to a few larger
> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the
> same outcome). I've tried running the program with parallel support using
> either openMPI or mpich. I've tried running on a single node using 24 cpus
> and 120g of RAM. It always stalls at the same step.
>
> Interestingly, one of the 22 large scaffolds always finishes and produces
> the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but
> the other 21 of 22 large scaffolds fail. This makes me think perhaps it's
> not a memory issue?
>
> In the case of both the completed and failed scaffolds, the
> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out,
> .specific.ori.out, .specific.cat.gz, .specific.out,
> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest
> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear
> finished from what I can tell).
> However, the particular contents in the parent directory to the
> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents
> generally always look something like this (that is, they stall with the
> same kind of files produced):
> ```
> 0
> evidence_0.gff
> query.fasta
> query.masked.fasta
> query.masked.fasta.index
> query.masked.gff
> run.log.child.0
> scaffold22.0.final.section
> scaffold22.0.pred.raw.section
> scaffold22.0.raw.section
> scaffold22.gff.ann
> scaffold22.gff.def
> scaffold22.gff.seq
> ```
>
> For the completed scaffold, there are many more files created:
> ```
> 0
> 10
> 100
> 20
> 30
> 40
> 50
> 60
> 70
> 80
> 90
> evidence_0.gff
> evidence_10.gff
> evidence_1.gff
> evidence_2.gff
> evidence_3.gff
> evidence_4.gff
> evidence_5.gff
> evidence_6.gff
> evidence_7.gff
> evidence_8.gff
> evidence_9.gff
> query.fasta
> query.masked.fasta
> query.masked.fasta.index
> query.masked.gff
> run.log.child.0
> run.log.child.1
> run.log.child.10
> run.log.child.2
> run.log.child.3
> run.log.child.4
> run.log.child.5
> run.log.child.6
> run.log.child.7
> run.log.child.8
> run.log.child.9
> scaffold4.0-1.raw.section
> scaffold4.0.final.section
> scaffold4.0.pred.raw.section
> scaffold4.0.raw.section
> scaffold4.10.final.section
> scaffold4.10.pred.raw.section
> scaffold4.10.raw.section
> scaffold4.1-2.raw.section
> scaffold4.1.final.section
> scaffold4.1.pred.raw.section
> scaffold4.1.raw.section
> scaffold4.2-3.raw.section
> scaffold4.2.final.section
> scaffold4.2.pred.raw.section
> scaffold4.2.raw.section
> scaffold4.3-4.raw.section
> scaffold4.3.final.section
> scaffold4.3.pred.raw.section
> scaffold4.3.raw.section
> scaffold4.4-5.raw.section
> scaffold4.4.final.section
> scaffold4.4.pred.raw.section
> scaffold4.4.raw.section
> scaffold4.5-6.raw.section
> scaffold4.5.final.section
> scaffold4.5.pred.raw.section
> scaffold4.5.raw.section
> scaffold4.6-7.raw.section
> scaffold4.6.final.section
> scaffold4.6.pred.raw.section
> scaffold4.6.raw.section
> scaffold4.7-8.raw.section
> scaffold4.7.final.section
> scaffold4.7.pred.raw.section
> scaffold4.7.raw.section
> scaffold4.8-9.raw.section
> scaffold4.8.final.section
> scaffold4.8.pred.raw.section
> scaffold4.8.raw.section
> scaffold4.9-10.raw.section
> scaffold4.9.final.section
> scaffold4.9.pred.raw.section
> scaffold4.9.raw.section
> ```
>
> Thanks for any troubleshooting tips you can offer.
>
> Cheers,
> Devon
>
> --
> Devon O'Rourke
> Postdoctoral researcher, Northern Arizona University
> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/
> twitter: @thesciencedork
> _______________________________________________
> maker-devel mailing list
> maker-devel at yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
>
>
>

-- 
Devon O'Rourke
Postdoctoral researcher, Northern Arizona University
Lab of Jeffrey T. Foster - https://fozlab.weebly.com/
twitter: @thesciencedork
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200226/ef56e94b/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fail-1a.log.gz
Type: application/x-gzip
Size: 21751 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200226/ef56e94b/attachment.tgz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fail-1b.log.gz
Type: application/x-gzip
Size: 2175 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200226/ef56e94b/attachment-0001.tgz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run1_maker_opts.ctl
Type: application/octet-stream
Size: 3720 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200226/ef56e94b/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run1_slurm.sh
Type: application/x-sh
Size: 788 bytes
Desc: not available
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20200226/ef56e94b/attachment-0003.sh>


More information about the maker-devel mailing list