[maker-devel] gff3_merge error - only half the contigs are reported
Sujai
sujaikumar at gmail.com
Tue Nov 19 01:33:06 MST 2013
Hi Eric
Thanks for uncovering the /tmp limitation. I had changed TMPDIR in my
maker_opts.ctl file, but I suppose gff3_merge would not use that setting,
and would use /tmp or whatever TMPDIR was set to in the current shell.
Cheers,
- Sujai
On 18 November 2013 20:24, Ross, Eric <ejr at stowers.org> wrote:
> I had the same problem Sujai reported on the 25th.
>
> When I run gff3_merge I only get back results from a subset of my
> scaffolds. Every scaffold has a gff file in the data store. fasta_merge
> works fine. No errors are printed to STDOUT.
>
> It took me some poking around, but this seems that if there is
> insufficient space in /tmp to generate the temporary files merge_gff3
> completes without error and just creates a gff file from whatever fit in
> the temporary files. I'm not sure why tempfile() doesn't return a useful
> error.
>
> I was able to get around the error by setting the environment variable
> TMPDIR to someplace with sufficient space, but if there is a way to
> generate a useful error it might be worth adding.
>
>
> Eric
>
>
>
>
> --original below--
> On Friday, October 25, 2013 11:22:08 AM UTC-5, Barry wrote:
> Hi Sujai,
> If you still have that original directory available a couple things you to
> try:
>
> # Does this give you the number of contigs you're expecting?
> find datastore_directory -name '*.gff3' | wc -l
>
> # If the above gives what you expect what does the file size look like on
> the smallest files?
> find ./ -type f | xargs ls -Sl | tail
>
> Barry
>
> On Oct 24, 2013, at 4:04 PM, Sujai wrote:
>
>
> Hi Barry
>
> The last stderr message in the job is from the maker command:
>
> Maker is now finished!!!
>
> my last 3 commands were (in my pbs/torque job script):
>
> ----job script----
> ...other commands to set variables etc...
> mpiexec -n 12 maker -g $SPLITFILE -base $SPLITFILE maker_opts.ctl
> maker_bopts.ctl maker_exe.ctl
> gff3_merge -o $PBS_O_WORKDIR/$SPLITFILE.gff3 -d
> $SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
> fasta_merge -d
> $SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
> -----
>
> so, no gff3_merge stderr messages
> the fasta_merge command (which came after gff3_merge) has completed
> correctly, with all proteins and transcripts accounted for.
> I just ran the same contigs again in smaller batches, and it seems to be
> proceeding correctly (the final gff3 files are between 49-55MB for each
> chunk of fasta files. In the previous case, the final GFF3 file was 99MB
> or 100MB (which is why I thought there was some sort of filesize/memory
> limit)
>
> Thanks for looking into this,
>
> - Sujai
>
>
>
>
> On 24 October 2013 21:42, Barry Moore <barry... at genetics.utah.edu <>>
> wrote:
>
> Hi Sujai,
> There are a couple of fatal errors that can be thrown by gff3_merge if it
> encounters errors. Did you capture STDERR and do you see any failures
> there?
>
> B
>
> On Oct 24, 2013, at 10:43 AM, Sujai wrote:
>
>
>
>
> Hi all
> First of all, thank you to Carson Holt and Mark Yandell and team for an
> excellent program with excellent installation instructions. Setting maker
> 2.28 up and running it (both with MPI and without) was a breeze on a PBS
> cluster. Everything worked as expected (doesn't happen that often in
> bioinformatics software! :-))
>
> I did a test run on 150 longish contigs (>50kb each) and everything worked
> fine, including gff3_merge, fasta_merge etc. I got predictions for all the
> contigs etc.
>
>
> However, when I ran it on the full 20,000 sequence file (minimum 300 bp, I
> know - I could have set the min length in maker_opts.ctl but I wanted the
> repeatmasker analysis done even for the smaller contigs), I had a problem:
>
> 1. the intermediate files were all fine (i.e. grep -c FINISHED
> ...master_datastore_index.log showed the right number of sequences)
>
> 2. But gff3_merge -d ...master_datastore_index.log only gave a file with
> about 9000 contigs. I ran it a few times and got the same result each
> time. fasta_merge gave back sequences from all ~20,000 contigs, so I know
> the contig_datastore was fine.
>
> 3. Could it be that gff3_merge hit some sort of output limit? the GFF3
> file created seems to be almost exactly 100MB. Or could it be a limit of
> my OS environment? I had a lot of memory (12 GB) and a large tmp directory
> (10 GB) so that should not be a problem...
>
> If anyone has seen this problem or can shed any light on this, that would
> be great.
>
> I'm going to try splitting the file into smaller sets and hoping for the
> best.
>
> Thanks
>
> - Sujai
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "maker-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "maker-devel" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/maker-devel/bApnLPFgmRc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "maker-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>
>
>
>
>
>
>
> On Friday, October 25, 2013 11:22:08 AM UTC-5, Barry wrote:
> Hi Sujai,
> If you still have that original directory available a couple things you to
> try:
>
> # Does this give you the number of contigs you're expecting?
> find datastore_directory -name '*.gff3' | wc -l
>
> # If the above gives what you expect what does the file size look like on
> the smallest files?
> find ./ -type f | xargs ls -Sl | tail
>
> Barry
>
> On Oct 24, 2013, at 4:04 PM, Sujai wrote:
>
>
> Hi Barry
>
> The last stderr message in the job is from the maker command:
>
> Maker is now finished!!!
>
> my last 3 commands were (in my pbs/torque job script):
>
> ----job script----
> ...other commands to set variables etc...
> mpiexec -n 12 maker -g $SPLITFILE -base $SPLITFILE maker_opts.ctl
> maker_bopts.ctl maker_exe.ctl
> gff3_merge -o $PBS_O_WORKDIR/$SPLITFILE.gff3 -d
> $SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
> fasta_merge -d
> $SPLITFILE.maker.output/${SPLITFILE}_master_datastore_index.log
> -----
>
> so, no gff3_merge stderr messages
> the fasta_merge command (which came after gff3_merge) has completed
> correctly, with all proteins and transcripts accounted for.
> I just ran the same contigs again in smaller batches, and it seems to be
> proceeding correctly (the final gff3 files are between 49-55MB for each
> chunk of fasta files. In the previous case, the final GFF3 file was 99MB
> or 100MB (which is why I thought there was some sort of filesize/memory
> limit)
>
> Thanks for looking into this,
>
> - Sujai
>
>
>
>
> On 24 October 2013 21:42, Barry Moore <barry... at genetics.utah.edu <>>
> wrote:
>
> Hi Sujai,
> There are a couple of fatal errors that can be thrown by gff3_merge if it
> encounters errors. Did you capture STDERR and do you see any failures
> there?
>
> B
>
> On Oct 24, 2013, at 10:43 AM, Sujai wrote:
>
>
>
>
> Hi all
> First of all, thank you to Carson Holt and Mark Yandell and team for an
> excellent program with excellent installation instructions. Setting maker
> 2.28 up and running it (both with MPI and without) was a breeze on a PBS
> cluster. Everything worked as expected (doesn't happen that often in
> bioinformatics software! :-))
>
> I did a test run on 150 longish contigs (>50kb each) and everything worked
> fine, including gff3_merge, fasta_merge etc. I got predictions for all the
> contigs etc.
>
>
> However, when I ran it on the full 20,000 sequence file (minimum 300 bp, I
> know - I could have set the min length in maker_opts.ctl but I wanted the
> repeatmasker analysis done even for the smaller contigs), I had a problem:
>
> 1. the intermediate files were all fine (i.e. grep -c FINISHED
> ...master_datastore_index.log showed the right number of sequences)
>
> 2. But gff3_merge -d ...master_datastore_index.log only gave a file with
> about 9000 contigs. I ran it a few times and got the same result each
> time. fasta_merge gave back sequences from all ~20,000 contigs, so I know
> the contig_datastore was fine.
>
> 3. Could it be that gff3_merge hit some sort of output limit? the GFF3
> file created seems to be almost exactly 100MB. Or could it be a limit of
> my OS environment? I had a lot of memory (12 GB) and a large tmp directory
> (10 GB) so that should not be a problem...
>
> If anyone has seen this problem or can shed any light on this, that would
> be great.
>
> I'm going to try splitting the file into smaller sets and hoping for the
> best.
>
> Thanks
>
> - Sujai
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "maker-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "maker-devel" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/maker-devel/bApnLPFgmRc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "maker-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to maker-devel... at googlegroups.com <>.
> To post to this group, send email to maker... at googlegroups.com <>.
> Visit this group at http://groups.google.com/group/maker-devel.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
> Barry Moore
> Research Scientist
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT 84112
> --------------------------------------------
> (801) 585-3543
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> maker-devel mailing list
> maker-devel at box290.bluehost.com
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://yandell-lab.org/pipermail/maker-devel_yandell-lab.org/attachments/20131119/7daa435c/attachment-0003.html>
More information about the maker-devel
mailing list