[maker-devel] gff3_merge error - only half the contigs are reported
Ross, Eric
ejr at stowers.org
Mon Nov 18 13:24:03 MST 2013
I had the same problem Sujai reported on the 25th.
When I run gff3_merge I only get back results from a subset of my
scaffolds. Every scaffold has a gff file in the data store. fasta_merge
works fine. No errors are printed to STDOUT.
It took me some poking around, but this seems that if there is
insufficient space in /tmp to generate the temporary files merge_gff3
completes without error and just creates a gff file from whatever fit in
the temporary files. I'm not sure why tempfile() doesn't return a useful
error.
I was able to get around the error by setting the environment variable
TMPDIR to someplace with sufficient space, but if there is a way to
generate a useful error it might be worth adding.
Eric
--original below--
On Friday, October 25, 2013 11:22:08 AM UTC-5, Barry wrote:
Hi Sujai,
If you still have that original directory available a couple things you to
try:
# Does this give you the number of contigs you're expecting?
find datastore_directory -name '*.gff3' | wc -l
# If the above gives what you expect what does the file size look like on
the smallest files?
find ./ -type f | xargs ls -Sl | tail
Barry
On Oct 24, 2013, at 4:04 PM, Sujai wrote:
Hi Barry
The last stderr message in the job is from the maker command:
Maker is now finished!!!
my last 3 commands were (in my pbs/torque job script):
----job script----
...other commands to set variables etc...
mpiexec -n 12 maker -g $SPLITFILE -base $SPLITFILE maker_opts.ctl
maker_bopts.ctl maker_exe.ctl
gff3_merge -o $PBS_O_WORKDIR/$SPLITFILE.gff3 -d
$SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
fasta_merge -d
$SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
-----
so, no gff3_merge stderr messages
the fasta_merge command (which came after gff3_merge) has completed
correctly, with all proteins and transcripts accounted for.
I just ran the same contigs again in smaller batches, and it seems to be
proceeding correctly (the final gff3 files are between 49-55MB for each
chunk of fasta files. In the previous case, the final GFF3 file was 99MB
or 100MB (which is why I thought there was some sort of filesize/memory
limit)
Thanks for looking into this,
- Sujai
On 24 October 2013 21:42, Barry Moore <barry... at genetics.utah.edu <>>
wrote:
Hi Sujai,
There are a couple of fatal errors that can be thrown by gff3_merge if it
encounters errors. Did you capture STDERR and do you see any failures
there?
B
On Oct 24, 2013, at 10:43 AM, Sujai wrote:
Hi all
First of all, thank you to Carson Holt and Mark Yandell and team for an
excellent program with excellent installation instructions. Setting maker
2.28 up and running it (both with MPI and without) was a breeze on a PBS
cluster. Everything worked as expected (doesn't happen that often in
bioinformatics software! :-))
I did a test run on 150 longish contigs (>50kb each) and everything worked
fine, including gff3_merge, fasta_merge etc. I got predictions for all the
contigs etc.
However, when I ran it on the full 20,000 sequence file (minimum 300 bp, I
know - I could have set the min length in maker_opts.ctl but I wanted the
repeatmasker analysis done even for the smaller contigs), I had a problem:
1. the intermediate files were all fine (i.e. grep -c FINISHED
...master_datastore_index.log showed the right number of sequences)
2. But gff3_merge -d ...master_datastore_index.log only gave a file with
about 9000 contigs. I ran it a few times and got the same result each
time. fasta_merge gave back sequences from all ~20,000 contigs, so I know
the contig_datastore was fine.
3. Could it be that gff3_merge hit some sort of output limit? the GFF3
file created seems to be almost exactly 100MB. Or could it be a limit of
my OS environment? I had a lot of memory (12 GB) and a large tmp directory
(10 GB) so that should not be a problem...
If anyone has seen this problem or can shed any light on this, that would
be great.
I'm going to try splitting the file into smaller sets and hoping for the
best.
Thanks
- Sujai
--
You received this message because you are subscribed to the Google Groups
"maker-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
--
You received this message because you are subscribed to a topic in the
Google Groups "maker-devel" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/maker-devel/bApnLPFgmRc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"maker-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
On Friday, October 25, 2013 11:22:08 AM UTC-5, Barry wrote:
Hi Sujai,
If you still have that original directory available a couple things you to
try:
# Does this give you the number of contigs you're expecting?
find datastore_directory -name '*.gff3' | wc -l
# If the above gives what you expect what does the file size look like on
the smallest files?
find ./ -type f | xargs ls -Sl | tail
Barry
On Oct 24, 2013, at 4:04 PM, Sujai wrote:
Hi Barry
The last stderr message in the job is from the maker command:
Maker is now finished!!!
my last 3 commands were (in my pbs/torque job script):
----job script----
...other commands to set variables etc...
mpiexec -n 12 maker -g $SPLITFILE -base $SPLITFILE maker_opts.ctl
maker_bopts.ctl maker_exe.ctl
gff3_merge -o $PBS_O_WORKDIR/$SPLITFILE.gff3 -d
$SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
fasta_merge -d
$SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
-----
so, no gff3_merge stderr messages
the fasta_merge command (which came after gff3_merge) has completed
correctly, with all proteins and transcripts accounted for.
I just ran the same contigs again in smaller batches, and it seems to be
proceeding correctly (the final gff3 files are between 49-55MB for each
chunk of fasta files. In the previous case, the final GFF3 file was 99MB
or 100MB (which is why I thought there was some sort of filesize/memory
limit)
Thanks for looking into this,
- Sujai
On 24 October 2013 21:42, Barry Moore <barry... at genetics.utah.edu <>>
wrote:
Hi Sujai,
There are a couple of fatal errors that can be thrown by gff3_merge if it
encounters errors. Did you capture STDERR and do you see any failures
there?
B
On Oct 24, 2013, at 10:43 AM, Sujai wrote:
Hi all
First of all, thank you to Carson Holt and Mark Yandell and team for an
excellent program with excellent installation instructions. Setting maker
2.28 up and running it (both with MPI and without) was a breeze on a PBS
cluster. Everything worked as expected (doesn't happen that often in
bioinformatics software! :-))
I did a test run on 150 longish contigs (>50kb each) and everything worked
fine, including gff3_merge, fasta_merge etc. I got predictions for all the
contigs etc.
However, when I ran it on the full 20,000 sequence file (minimum 300 bp, I
know - I could have set the min length in maker_opts.ctl but I wanted the
repeatmasker analysis done even for the smaller contigs), I had a problem:
1. the intermediate files were all fine (i.e. grep -c FINISHED
...master_datastore_index.log showed the right number of sequences)
2. But gff3_merge -d ...master_datastore_index.log only gave a file with
about 9000 contigs. I ran it a few times and got the same result each
time. fasta_merge gave back sequences from all ~20,000 contigs, so I know
the contig_datastore was fine.
3. Could it be that gff3_merge hit some sort of output limit? the GFF3
file created seems to be almost exactly 100MB. Or could it be a limit of
my OS environment? I had a lot of memory (12 GB) and a large tmp directory
(10 GB) so that should not be a problem...
If anyone has seen this problem or can shed any light on this, that would
be great.
I'm going to try splitting the file into smaller sets and hoping for the
best.
Thanks
- Sujai
--
You received this message because you are subscribed to the Google Groups
"maker-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
--
You received this message because you are subscribed to a topic in the
Google Groups "maker-devel" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/maker-devel/bApnLPFgmRc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"maker-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to maker-devel... at googlegroups.com <>.
To post to this group, send email to maker... at googlegroups.com <>.
Visit this group at http://groups.google.com/group/maker-devel.
For more options, visit https://groups.google.com/groups/opt_out.
Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------
(801) 585-3543
More information about the maker-devel
mailing list