From carsonhh at gmail.com Tue Mar 3 12:46:51 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Mar 2020 12:46:51 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: I?m glad you were able to make it work. Thanks, Carson > On Feb 29, 2020, at 10:27 AM, Devon O'Rourke wrote: > > Hi once again Carson, > Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. > Thanks for your assistance, > Devon > > On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke > wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download . > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt > wrote: > For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. > > ?Carson > > >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: >> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. >> >> ?Carson >> >> >> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >>> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >>> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>>

>> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From shore at yorku.ca Fri Mar 6 11:32:48 2020 From: shore at yorku.ca (shore at yorku.ca) Date: Fri, 06 Mar 2020 13:32:48 -0500 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping Message-ID: <1583519568.5e629750e7931@oldmymail.yorku.ca> Hello, I've been attempting to include the iprscan results with my maker gff file. I used the script below to add the iprscan results to my maker gff: ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff I also used the script below to generate a gff of iprscan domains iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff At this point, I wasn't sure how to proceed. I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. And then ran maker2jbrowse, everything seems to work except I get error message "Don't know how to format iprscan tracks, skipping" If I view the file under jbrowse I can certainly see the iprscan results when I click on a transcript, but there are no tracks of iprscan results in the jbrowse. Sorry, this is a bit long winded. I suspect perhaps that concatenating the two GFFs was perhaps not the right way to proceed? Thanks Joel -- Dr. Joel S. Shore Prof. Biology York University From carsonhh at gmail.com Fri Mar 6 12:39:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Mar 2020 12:39:38 -0700 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping In-Reply-To: <1583519568.5e629750e7931@oldmymail.yorku.ca> References: <1583519568.5e629750e7931@oldmymail.yorku.ca> Message-ID: The maker2jbrowse inside maker is just an alias that launches the maker2jbrowse script inside of jbrowse itself (i.e. ?/jbrowse-1.16.8-release/bin/maker2jbrowse). No longer maintained by us, but rather by the jbrowse team. You can edit the maker2jbrowse script yourself to add an ?iprscan' line or any other feature type you want by copying an existing feature in this section (image attached) and renaming values such as ?blastn? to be ?iprscan? (these are the command line options that get sent to flatfile-to-json.pl just as if you were runninging it manually) ?> For '--type?, I believe ?iprscan? uses ?match? in the GFF3 column, so instead of ?protein_match? or ?expressed_sequence_match?, just trim it to ?match? in the maker2jbrowse section as well. You also must edit the ?/jbrowse/css/maker.scss file to choose what colors you want the feature display to have. Similar to the example above, just copy an existing feature and make a new one where you replace names like ?blastn' with ?iprscan? (image attached) ?> ?Carson > On Mar 6, 2020, at 11:32 AM, shore at yorku.ca wrote: > > Hello, > > I've been attempting to include the iprscan results with my maker gff file. > > I used the script below to add the iprscan results to my maker gff: > > ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff > > I also used the script below to generate a gff of iprscan domains > > iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff > > At this point, I wasn't sure how to proceed. > I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. > > And then ran maker2jbrowse, everything seems to work except I get error message > > "Don't know how to format iprscan tracks, skipping" > > If I view the file under jbrowse I can certainly see the iprscan results when I > click on a transcript, but there are no tracks of iprscan results in the > jbrowse. > > Sorry, this is a bit long winded. > > I suspect perhaps that concatenating the two GFFs was perhaps not the right way > to proceed? > > Thanks > Joel > -- > Dr. Joel S. Shore > Prof. Biology > York University > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 365893 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-4.png Type: image/png Size: 77391 bytes Desc: not available URL: From devon.orourke at gmail.com Mon Mar 9 07:24:11 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Mon, 9 Mar 2020 09:24:11 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file Message-ID: Hi Carson, I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might *also* be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum ( https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets *close* but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds *do finish*!). The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: ``` DIED RANK 4:6:0:4 DIED RANK 5:6:0:4 DIED RANK 6:6:0:53 ``` The second error is always: ``` DIED COUNT 1 ``` You can view the .log file here: https://osf.io/4wn6h/download. I've attached the .opts file to this message. Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish *would* finish otherwise? Thank you once more for your help with troubleshooting, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makerRun2_opts.ctl Type: application/octet-stream Size: 3854 bytes Desc: not available URL: From christopher.keeling.1 at ulaval.ca Sat Mar 14 11:24:41 2020 From: christopher.keeling.1 at ulaval.ca (Christopher Keeling) Date: Sat, 14 Mar 2020 17:24:41 +0000 Subject: [maker-devel] Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use Message-ID: <4A449297-A6D1-4A75-9547-FB5F70CE1A0A@ulaval.ca> An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 16 10:48:13 2020 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Mar 2020 10:48:13 -0600 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: References: Message-ID: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> In the log I see these ?> ERROR: Could not open file: /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp Cannot send after transport endpoint shutdown You are having IO timeout issues. It can be an issue with a single node on your cluster, or in issue with the object storage servers on the lustre network storage. Or your job may just be too big for the network storage to handle. You will likely need to run on fewer nodes or you can have your system admin increase timeout options for Lustre to see if that helps. ?Carson > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might also be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum (https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM ). > > The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets close but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds do finish!). > > The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download . I've attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish would finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Tue Mar 17 04:21:58 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Tue, 17 Mar 2020 06:21:58 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> References: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> Message-ID: Ah, Thanks so much Carson. The issue ended up being that our sysadmin installed Perl modules that were of a version that was incompatible with the version of Perl running with Maker. Once I installed a virtual environment with the appropriate Perl and Perl modules that were happy to work together these errors went away. Thanks again! Devon On Mon, Mar 16, 2020 at 12:48 PM Carson Holt wrote: > In the log I see these ?> > > ERROR: Could not open file: > /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp > Cannot send after transport endpoint shutdown > > You are having IO timeout issues. It can be an issue with a single node > on your cluster, or in issue with the object storage servers on the lustre > network storage. Or your job may just be too big for the network storage > to handle. You will likely need to run on fewer nodes or you can have your > system admin increase timeout options for Lustre to see if that helps. > > ?Carson > > > > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke > wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to > your expert advise on resetting MPI parameters. Because earlier tests > (prior to this successful run) indicated that other dependency programs > might *also* be contributing to failed Maker jobs, this first successful > run consisted entirely of GFF data as input for the est, altest, and > protein evidence, as well as using a custom rm_gff file for complex repeats > (I was following the strategy posted in an earlier thread in this forum ( > https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). > > The good news is that using GFF files only will get the job to finish, the > bad news is that if I try to input the original fasta files instead of the > resulting GFF's for the evidence data, Maker gets *close* but fails to > finish the job at the stage where (I think) the per-scaffold chunks of > "evidence_*.gff", "scaffold*.*.pred.raw.section", and > "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", > "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" > files. The behavior is not entirely consistent across all scaffolds: most > scaffolds in fact produce finished files (the "scaffold*.gff", > "transcripts.fasta", etc.), however the majority of the failed scaffolds > are the longest ones (though at least a handful of longer scaffolds *do > finish*!). > > The initial error in the run.log.child.* files in these failed scaffolds > aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download. I've > attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not > optimized for these jobs. I could certainly re-run the same data through a > machine without MPI at this point because all the jobs are basically > completed (no more blasting or repeat masking is needed). Thus I think the > question is - should I just restart the run without MPI and see if it > finishes? Or perhaps, there are alternative Maker scripts to try testing > directly (even on a single scaffold subdirectory) to see if these instances > where Maker doesn't quite finish *would* finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Mar 20 05:30:56 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 20 Mar 2020 07:30:56 -0400 Subject: [maker-devel] guidance for first and subsequent annotation parameters Message-ID: With so many posts on the forum it's been challenging to determine what the best practices are for performing multiple rounds of annotation with Maker. My first round used est, altest, and protein fasta files with a custom GFF repeat masked file. The resulting vertebrate genome produced 21,970 gene models with a mean length of about 9016 bp; the BUSCO score was C:66.0%[S:64.2%,D:1.8%],F:4.2%,M:29.8%,n:9226 (mammalia_odb10 set). Things seemed to be on the right track, so I set up the next Maker round using both SNAP and Augustus-trained information in the round2 maker_opts.ctl file. At the end of that second round, I noticed a marked *decrease* in BUSCO score (C:53.3%[S:51.0%,D:2.3%],F:11.6%,M:35.1%,n:9226), yet an increase in the number of gene models (28,646) and mean length (16266 bp). This got me to wondering if I was setting up the _opts.ctl file incorrectly? I'm concerned with a few things (and maybe missing even more I should be concerned about!?): - I specified the evidence to come from EST/Protein instead of using the section available under "#-----Re-annotation Using MAKER Derived GFF3". Maybe that was a fundamental mistake? What is the expected change in behavior if I moved my round1 Maker output into that category instead of using the EST/Protein Homology evidence sections as I did below? - I wasn't sure what to do with the RepeatMasking GFF files in Round2. The RepeatMasker GFF I included in Round1 consisted of just complex repeats (setting model_org=simple and softmask=1 to effectively only hard mask those complex areas for the initial alignments). But what should be used in Round2 - the output GFF of Round1, or the input GFF from Round1? Here's what I did for the Round2 maker_opts.ctl file: #-----Genome (these are always required) genome=/scratch/dro49/myluwork/annotation/input_files/mylu_hic_rails_noMasks.fa organism_type=eukaryotic #-----EST Evidence (for best results provide a file for at least one) est_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.est2genome.gff altest_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.cdna2genome.gff #-----Protein Homology Evidence (for best results provide a file for at least one) protein_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.protein2genome.gff #-----Repeat Masking (leave values blank to skip repeat masking) rm_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.repeats.gff prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/scratch/dro49/myluwork/annotation/maker_rd2/snap_rd1/lu_rnd1.zff.length50_aed0.25.hmm #SNAP HMM file augustus_species=mylu #Augustus gene prediction species model run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default) Thank you for your insights and support, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Tue Mar 17 21:22:45 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Wed, 18 Mar 2020 04:22:45 +0100 Subject: [maker-devel] CRL_Step2 will not produce required outputs Message-ID: Please I am trying to to follow this tutorial http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced . Running this step perl DIR_CRL/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile seqfile.out99 --resultfile seqfile.result99 \ --sequencefile seqfile --removed_repeats CRL_Step2_Passed_Elements.fasta the expected output ought to be CRL_Step2_Passed_Elements.fasta Repeat_*.fasta files But am only getting CRL_Step2_Passed_Elements.fasta with no Repeat_*.fasta files Please what could be the problem? Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From luca.peruzza at unipd.it Mon Mar 23 12:30:23 2020 From: luca.peruzza at unipd.it (Luca Peruzza) Date: Mon, 23 Mar 2020 19:30:23 +0100 Subject: [maker-devel] maker_functional_gff error "Can't use string ("") as a HASH ref" Message-ID: Hi all, I am using the ?maker_functional_gff? script to update my gff3 by adding functional annotation from blast against uniprot, however I do get the following error when running the code: Can't use string ("") as a HASH ref while "strict refs" in use at /opt/maker/bin/maker_functional_gff line 55, <$IN> line 277933. I checked the line and it appears like the other lines in the gff3 file so I was wondering if you guys know what is causing the error? The ?offending? line and the following one are: tig00632243??? maker? gene??? 2926??? 22617? .?????????? -?????????? .??????????? ID=Gacu_00045928;Name=Gacu_00045928;Alias=maker-tig00632243-snap-gene-0.9;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; tig00632243??? maker? mRNA? 2926??? 22617? .?????????? -?????????? .?????????? ID=Gacu_00045928-RA;Parent=Gacu_00045928;Name=Gacu_00045928-RA;Alias=maker-tig00632243-snap-gene-0.9-mRNA-1;_AED=0.08;_QI=0|0|0|0.75|0.81|0.83|12|0|714;_eAED=0.34;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; Thanks Luca -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Tue Mar 24 17:12:01 2020 From: pickettbd at gmail.com (Brandon Pickett) Date: Tue, 24 Mar 2020 16:12:01 -0700 Subject: [maker-devel] substr outside of string at .../Carp.pm 346 Message-ID: I completed the first round of Maker. I subsequently trained Snap, Genemark-es, and Augustus. I've since fed those results back into Maker for a second round. Some sequences were successful, others were not. On some, I encountered an error about calling translate without a seq argument. I read some other threads about similar issues, and I followed the advice to isolate a single sequence using -g and -base. My config files can be found at this link: https://byu.box.com/s/1tbp48djblo31ruuy8zm62k1vxoyozhq. The following are the contents of stderr: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold_66 Length: 2264627 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. preparing masked sequence preparing ab-inits running snap. #--------- command -------------# Widget::snap: /path/to/snap /path/to/data/snap/rnd1/genome.hmm /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.genome%2Ehmm.snap #-------------------------------# scoring....decoding.10.20.30.40.50.60.70.80.90.100 done scoring....decoding.10.20.30.40.50.60.70.80.90.100 done running augustus. #--------- command -------------# Widget::augustus: /path/to/apps/augustus/3.3.2/final/bin/augustus --AUGUSTUS_CONFIG_PATH=/path/to/data/augustus_config --species=pacbf --UTR=off /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.pacbf.augustus #-------------------------------# running genemark. #--------- command -------------# Widget::genemark: /path/to/apps/perl/5.28/perl/bin/perl /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/genemark/gmhmm_wrap -m /path/to/data/gmes/output/gmhmm.mod -g /path/to/apps/genemark-es/4.38/gmhmme3 -p /path/to/apps/genemark-es/4.38/probuild -o /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0 #-------------------------------# gathering ab-init output files deleted:0 genes deleted:0 genes substr outside of string at /path/to/apps/perl/5.28/perl/lib/5.28.0/Carp.pm line 346. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Root/Root.pm:447 STACK: Bio::Tools::CodonTable::translate /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Tools/CodonTable.pm:419 STACK: CGL::TranslationMachine::longest_translation_plus_stop /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/CGL/TranslationMachine.pm:280 STACK: maker::auto_annotator::get_translation_seq /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/maker/ auto_annotator.pm:3575 STACK: Widget::snap::load_phat_hits /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:973 STACK: Widget::snap::parse /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:689 STACK: GI::parse_abinit_file /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/GI.pm:1228 STACK: Process::MpiChunk::_go /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:1473 STACK: Process::MpiChunk::run /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=somenode.rc.byu.edu ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scaffold_66 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold_66 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! The command I ran was as follows: maker -g /path/to/data/originalAssembly/split/scaffold_66.fasta \ -base scaffolds -TMP /tmp/35082116 -cpus 1 I usually run maker with MPI (e.g., mpirun maker -TMP /tmp/abcdefg -cpus 1), but didn't see any need when I was running a single sequence as a test. Note that this output from this isolated run matches what I've been seeing in the mixed output from MPI, just slightly jumbled together with other things. The following is the bottom of the run.log file for this sequence in the datastore: LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark DIED RANK 0:4:0:0 DIED COUNT 1 DIED RANK 0 DIED COUNT 1 The contents of theVoid directory can be viewed at this link: https://byu.box.com/s/hqwngdvehs8dfoymtrkjyismq3p8ayv1. Do you have any suggestions on how I can resolve this error? Thank you, Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpapoli at gmail.com Wed Mar 25 14:38:02 2020 From: hpapoli at gmail.com (Homa Papoli) Date: Wed, 25 Mar 2020 21:38:02 +0100 Subject: [maker-devel] repeatmasker output gff Message-ID: Hello, I have 2 questions regarding user maker: I have used repeatmasker for my genome separately and I have a gff file. However, my gff file, in the third column, has the word "similarity". In a workshop I had taken on genome annotation, it was said that the gff for maker should have "match" and "match_part" for the third column. I was wondering whether I could use the original gff output of repeatmasker or should I make any changes to it? Another question is about running maker. Since maker takes several days to run, if the job gets interrupted due to limit in days of running the job, I was wondering whether it is possible to re-start maker from where it got interrupted? Thank you, Homa -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhao.wei at umu.se Mon Mar 30 03:37:04 2020 From: zhao.wei at umu.se (Wei Zhao) Date: Mon, 30 Mar 2020 09:37:04 +0000 Subject: [maker-devel] Maker annotation AED scores are around 0.5 Message-ID: <1b0e5c3cae1b410397e61262a2384039@umu.se> Dear maker team, I am writing to ask for your help. I am using make to annotate a big genome ~9 Gbp, I have 3 evidences: 1) Transcriptome of this species; 2) protein sequence from relative species; 3) Augustus model trained from pasa. When I use all of these 3 evidences to annotate the genome (basic pipeline), the distribution of AED score is weird (single peak around 0.5). I have also tried to update the gene model I got from pasa using maker, the distribution of AED scores is the same. But when I try to only use EST or protein as evidence (est2genome or protein2genome), the AED scores is normal (close to 0). To my understand, it seems all the 3 evidences are conflict with each other, results in the AED scores is higher (~ 0.5) than expected, could you give me some suggestion on how to fix this problem? Best regards, Wei [cid:image002.png at 01D60687.740255B0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E6F3EF742C40408F8390EE9A1FF29894.png Type: image/png Size: 34543 bytes Desc: E6F3EF742C40408F8390EE9A1FF29894.png URL: From patrick.gagne at canada.ca Tue Mar 31 11:53:13 2020 From: patrick.gagne at canada.ca (=?iso-8859-1?Q?Gagn=E9=2C_Patrick_=28NRCAN/RNCAN=29?=) Date: Tue, 31 Mar 2020 17:53:13 +0000 Subject: [maker-devel] Problem with Maker using GeneMark Message-ID: Hi I've come across a bug while using Maker. I'm trying to annotate a 560Mb Genome and I'm using Snap, GeneMark and Augustus in Maker. When Maker is executing the GeneMark command, it just failed (GeneMark Failed) without any error messages, so I've decided to debug it myself...So I launched every commands manually and found out that the gmhmm_wrap is causing the issue. The problem is in fact in the prebuild command; it doesn't do anything (from what I understand, this command is supposed to split the fasta whre there is NNN to prevent GeneMark Crash). My genome got very long stretches of N (up to 14Kb) After checking the prebuild help, I've found that the command used in gmhmm_wrap is not valid (half the options are not in probuild anymore, probably because of GeneMark updates) I have tried different Probuild (those I could download from GeneMark site, they don't give older versions except those that come with their program's versions) 2.16 2.34 2.44 (lastest that come with GeneMark ES) I've also tried to edit the gmhmm_wrap script and modify the prebuild command, but even when the fasta are splitted, I got another bug : ERROR: Logic error in getting offset. I've tried to replace the command for the offset extraction, which also worked, but now I got a bug when Maker try to get the ab-initio output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! Could you please tell me how to fix this, or tell me what probuild I should use (I will ask the GeneMark support for it) Thanks in advance P.S Sorry for my English, It's not my first language and I'm still learning Patrick Gagn? Sp?cialiste en bio-informatique / Bioinformatics specialist Service canadien des for?ts / Canadian Forest Service Ressources naturelles Canada / Natural Resources Canada Gouvernement du Canada / Government of Canada Centre de foresterie des Laurentides/Laurentian Forestry Centre 1055, rue du P.E.P.S. C.P. 10380, succ. Sainte-Foy/P.O. Box 10380, Stn. Sainte-Foy Qu?bec (Qc) G1V 4C7 Laboratoire de pathologie foresti?re (Local 2.21) patrick.gagne at canada.ca / tel : (418) 648-4443 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 3 12:46:51 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Mar 2020 12:46:51 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: I?m glad you were able to make it work. Thanks, Carson > On Feb 29, 2020, at 10:27 AM, Devon O'Rourke wrote: > > Hi once again Carson, > Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. > Thanks for your assistance, > Devon > > On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke > wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download . > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt > wrote: > For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. > > ?Carson > > >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: >> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. >> >> ?Carson >> >> >> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >>> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >>> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>>

>> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From shore at yorku.ca Fri Mar 6 11:32:48 2020 From: shore at yorku.ca (shore at yorku.ca) Date: Fri, 06 Mar 2020 13:32:48 -0500 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping Message-ID: <1583519568.5e629750e7931@oldmymail.yorku.ca> Hello, I've been attempting to include the iprscan results with my maker gff file. I used the script below to add the iprscan results to my maker gff: ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff I also used the script below to generate a gff of iprscan domains iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff At this point, I wasn't sure how to proceed. I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. And then ran maker2jbrowse, everything seems to work except I get error message "Don't know how to format iprscan tracks, skipping" If I view the file under jbrowse I can certainly see the iprscan results when I click on a transcript, but there are no tracks of iprscan results in the jbrowse. Sorry, this is a bit long winded. I suspect perhaps that concatenating the two GFFs was perhaps not the right way to proceed? Thanks Joel -- Dr. Joel S. Shore Prof. Biology York University From carsonhh at gmail.com Fri Mar 6 12:39:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Mar 2020 12:39:38 -0700 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping In-Reply-To: <1583519568.5e629750e7931@oldmymail.yorku.ca> References: <1583519568.5e629750e7931@oldmymail.yorku.ca> Message-ID: The maker2jbrowse inside maker is just an alias that launches the maker2jbrowse script inside of jbrowse itself (i.e. ?/jbrowse-1.16.8-release/bin/maker2jbrowse). No longer maintained by us, but rather by the jbrowse team. You can edit the maker2jbrowse script yourself to add an ?iprscan' line or any other feature type you want by copying an existing feature in this section (image attached) and renaming values such as ?blastn? to be ?iprscan? (these are the command line options that get sent to flatfile-to-json.pl just as if you were runninging it manually) ?> For '--type?, I believe ?iprscan? uses ?match? in the GFF3 column, so instead of ?protein_match? or ?expressed_sequence_match?, just trim it to ?match? in the maker2jbrowse section as well. You also must edit the ?/jbrowse/css/maker.scss file to choose what colors you want the feature display to have. Similar to the example above, just copy an existing feature and make a new one where you replace names like ?blastn' with ?iprscan? (image attached) ?> ?Carson > On Mar 6, 2020, at 11:32 AM, shore at yorku.ca wrote: > > Hello, > > I've been attempting to include the iprscan results with my maker gff file. > > I used the script below to add the iprscan results to my maker gff: > > ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff > > I also used the script below to generate a gff of iprscan domains > > iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff > > At this point, I wasn't sure how to proceed. > I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. > > And then ran maker2jbrowse, everything seems to work except I get error message > > "Don't know how to format iprscan tracks, skipping" > > If I view the file under jbrowse I can certainly see the iprscan results when I > click on a transcript, but there are no tracks of iprscan results in the > jbrowse. > > Sorry, this is a bit long winded. > > I suspect perhaps that concatenating the two GFFs was perhaps not the right way > to proceed? > > Thanks > Joel > -- > Dr. Joel S. Shore > Prof. Biology > York University > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 365893 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-4.png Type: image/png Size: 77391 bytes Desc: not available URL: From devon.orourke at gmail.com Mon Mar 9 07:24:11 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Mon, 9 Mar 2020 09:24:11 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file Message-ID: Hi Carson, I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might *also* be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum ( https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets *close* but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds *do finish*!). The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: ``` DIED RANK 4:6:0:4 DIED RANK 5:6:0:4 DIED RANK 6:6:0:53 ``` The second error is always: ``` DIED COUNT 1 ``` You can view the .log file here: https://osf.io/4wn6h/download. I've attached the .opts file to this message. Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish *would* finish otherwise? Thank you once more for your help with troubleshooting, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makerRun2_opts.ctl Type: application/octet-stream Size: 3855 bytes Desc: not available URL: From christopher.keeling.1 at ulaval.ca Sat Mar 14 11:24:41 2020 From: christopher.keeling.1 at ulaval.ca (Christopher Keeling) Date: Sat, 14 Mar 2020 17:24:41 +0000 Subject: [maker-devel] Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use Message-ID: <4A449297-A6D1-4A75-9547-FB5F70CE1A0A@ulaval.ca> An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 16 10:48:13 2020 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Mar 2020 10:48:13 -0600 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: References: Message-ID: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> In the log I see these ?> ERROR: Could not open file: /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp Cannot send after transport endpoint shutdown You are having IO timeout issues. It can be an issue with a single node on your cluster, or in issue with the object storage servers on the lustre network storage. Or your job may just be too big for the network storage to handle. You will likely need to run on fewer nodes or you can have your system admin increase timeout options for Lustre to see if that helps. ?Carson > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might also be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum (https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM ). > > The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets close but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds do finish!). > > The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download . I've attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish would finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Tue Mar 17 04:21:58 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Tue, 17 Mar 2020 06:21:58 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> References: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> Message-ID: Ah, Thanks so much Carson. The issue ended up being that our sysadmin installed Perl modules that were of a version that was incompatible with the version of Perl running with Maker. Once I installed a virtual environment with the appropriate Perl and Perl modules that were happy to work together these errors went away. Thanks again! Devon On Mon, Mar 16, 2020 at 12:48 PM Carson Holt wrote: > In the log I see these ?> > > ERROR: Could not open file: > /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp > Cannot send after transport endpoint shutdown > > You are having IO timeout issues. It can be an issue with a single node > on your cluster, or in issue with the object storage servers on the lustre > network storage. Or your job may just be too big for the network storage > to handle. You will likely need to run on fewer nodes or you can have your > system admin increase timeout options for Lustre to see if that helps. > > ?Carson > > > > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke > wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to > your expert advise on resetting MPI parameters. Because earlier tests > (prior to this successful run) indicated that other dependency programs > might *also* be contributing to failed Maker jobs, this first successful > run consisted entirely of GFF data as input for the est, altest, and > protein evidence, as well as using a custom rm_gff file for complex repeats > (I was following the strategy posted in an earlier thread in this forum ( > https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). > > The good news is that using GFF files only will get the job to finish, the > bad news is that if I try to input the original fasta files instead of the > resulting GFF's for the evidence data, Maker gets *close* but fails to > finish the job at the stage where (I think) the per-scaffold chunks of > "evidence_*.gff", "scaffold*.*.pred.raw.section", and > "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", > "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" > files. The behavior is not entirely consistent across all scaffolds: most > scaffolds in fact produce finished files (the "scaffold*.gff", > "transcripts.fasta", etc.), however the majority of the failed scaffolds > are the longest ones (though at least a handful of longer scaffolds *do > finish*!). > > The initial error in the run.log.child.* files in these failed scaffolds > aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download. I've > attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not > optimized for these jobs. I could certainly re-run the same data through a > machine without MPI at this point because all the jobs are basically > completed (no more blasting or repeat masking is needed). Thus I think the > question is - should I just restart the run without MPI and see if it > finishes? Or perhaps, there are alternative Maker scripts to try testing > directly (even on a single scaffold subdirectory) to see if these instances > where Maker doesn't quite finish *would* finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Mar 20 05:30:56 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 20 Mar 2020 07:30:56 -0400 Subject: [maker-devel] guidance for first and subsequent annotation parameters Message-ID: With so many posts on the forum it's been challenging to determine what the best practices are for performing multiple rounds of annotation with Maker. My first round used est, altest, and protein fasta files with a custom GFF repeat masked file. The resulting vertebrate genome produced 21,970 gene models with a mean length of about 9016 bp; the BUSCO score was C:66.0%[S:64.2%,D:1.8%],F:4.2%,M:29.8%,n:9226 (mammalia_odb10 set). Things seemed to be on the right track, so I set up the next Maker round using both SNAP and Augustus-trained information in the round2 maker_opts.ctl file. At the end of that second round, I noticed a marked *decrease* in BUSCO score (C:53.3%[S:51.0%,D:2.3%],F:11.6%,M:35.1%,n:9226), yet an increase in the number of gene models (28,646) and mean length (16266 bp). This got me to wondering if I was setting up the _opts.ctl file incorrectly? I'm concerned with a few things (and maybe missing even more I should be concerned about!?): - I specified the evidence to come from EST/Protein instead of using the section available under "#-----Re-annotation Using MAKER Derived GFF3". Maybe that was a fundamental mistake? What is the expected change in behavior if I moved my round1 Maker output into that category instead of using the EST/Protein Homology evidence sections as I did below? - I wasn't sure what to do with the RepeatMasking GFF files in Round2. The RepeatMasker GFF I included in Round1 consisted of just complex repeats (setting model_org=simple and softmask=1 to effectively only hard mask those complex areas for the initial alignments). But what should be used in Round2 - the output GFF of Round1, or the input GFF from Round1? Here's what I did for the Round2 maker_opts.ctl file: #-----Genome (these are always required) genome=/scratch/dro49/myluwork/annotation/input_files/mylu_hic_rails_noMasks.fa organism_type=eukaryotic #-----EST Evidence (for best results provide a file for at least one) est_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.est2genome.gff altest_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.cdna2genome.gff #-----Protein Homology Evidence (for best results provide a file for at least one) protein_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.protein2genome.gff #-----Repeat Masking (leave values blank to skip repeat masking) rm_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.repeats.gff prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/scratch/dro49/myluwork/annotation/maker_rd2/snap_rd1/lu_rnd1.zff.length50_aed0.25.hmm #SNAP HMM file augustus_species=mylu #Augustus gene prediction species model run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default) Thank you for your insights and support, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Tue Mar 17 21:22:45 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Wed, 18 Mar 2020 04:22:45 +0100 Subject: [maker-devel] CRL_Step2 will not produce required outputs Message-ID: Please I am trying to to follow this tutorial http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced . Running this step perl DIR_CRL/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile seqfile.out99 --resultfile seqfile.result99 \ --sequencefile seqfile --removed_repeats CRL_Step2_Passed_Elements.fasta the expected output ought to be CRL_Step2_Passed_Elements.fasta Repeat_*.fasta files But am only getting CRL_Step2_Passed_Elements.fasta with no Repeat_*.fasta files Please what could be the problem? Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From luca.peruzza at unipd.it Mon Mar 23 12:30:23 2020 From: luca.peruzza at unipd.it (Luca Peruzza) Date: Mon, 23 Mar 2020 19:30:23 +0100 Subject: [maker-devel] maker_functional_gff error "Can't use string ("") as a HASH ref" Message-ID: Hi all, I am using the ?maker_functional_gff? script to update my gff3 by adding functional annotation from blast against uniprot, however I do get the following error when running the code: Can't use string ("") as a HASH ref while "strict refs" in use at /opt/maker/bin/maker_functional_gff line 55, <$IN> line 277933. I checked the line and it appears like the other lines in the gff3 file so I was wondering if you guys know what is causing the error? The ?offending? line and the following one are: tig00632243??? maker? gene??? 2926??? 22617? .?????????? -?????????? .??????????? ID=Gacu_00045928;Name=Gacu_00045928;Alias=maker-tig00632243-snap-gene-0.9;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; tig00632243??? maker? mRNA? 2926??? 22617? .?????????? -?????????? .?????????? ID=Gacu_00045928-RA;Parent=Gacu_00045928;Name=Gacu_00045928-RA;Alias=maker-tig00632243-snap-gene-0.9-mRNA-1;_AED=0.08;_QI=0|0|0|0.75|0.81|0.83|12|0|714;_eAED=0.34;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; Thanks Luca -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Tue Mar 24 17:12:01 2020 From: pickettbd at gmail.com (Brandon Pickett) Date: Tue, 24 Mar 2020 16:12:01 -0700 Subject: [maker-devel] substr outside of string at .../Carp.pm 346 Message-ID: I completed the first round of Maker. I subsequently trained Snap, Genemark-es, and Augustus. I've since fed those results back into Maker for a second round. Some sequences were successful, others were not. On some, I encountered an error about calling translate without a seq argument. I read some other threads about similar issues, and I followed the advice to isolate a single sequence using -g and -base. My config files can be found at this link: https://byu.box.com/s/1tbp48djblo31ruuy8zm62k1vxoyozhq. The following are the contents of stderr: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold_66 Length: 2264627 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. preparing masked sequence preparing ab-inits running snap. #--------- command -------------# Widget::snap: /path/to/snap /path/to/data/snap/rnd1/genome.hmm /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.genome%2Ehmm.snap #-------------------------------# scoring....decoding.10.20.30.40.50.60.70.80.90.100 done scoring....decoding.10.20.30.40.50.60.70.80.90.100 done running augustus. #--------- command -------------# Widget::augustus: /path/to/apps/augustus/3.3.2/final/bin/augustus --AUGUSTUS_CONFIG_PATH=/path/to/data/augustus_config --species=pacbf --UTR=off /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.pacbf.augustus #-------------------------------# running genemark. #--------- command -------------# Widget::genemark: /path/to/apps/perl/5.28/perl/bin/perl /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/genemark/gmhmm_wrap -m /path/to/data/gmes/output/gmhmm.mod -g /path/to/apps/genemark-es/4.38/gmhmme3 -p /path/to/apps/genemark-es/4.38/probuild -o /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0 #-------------------------------# gathering ab-init output files deleted:0 genes deleted:0 genes substr outside of string at /path/to/apps/perl/5.28/perl/lib/5.28.0/Carp.pm line 346. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Root/Root.pm:447 STACK: Bio::Tools::CodonTable::translate /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Tools/CodonTable.pm:419 STACK: CGL::TranslationMachine::longest_translation_plus_stop /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/CGL/TranslationMachine.pm:280 STACK: maker::auto_annotator::get_translation_seq /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/maker/ auto_annotator.pm:3575 STACK: Widget::snap::load_phat_hits /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:973 STACK: Widget::snap::parse /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:689 STACK: GI::parse_abinit_file /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/GI.pm:1228 STACK: Process::MpiChunk::_go /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:1473 STACK: Process::MpiChunk::run /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=somenode.rc.byu.edu ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scaffold_66 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold_66 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! The command I ran was as follows: maker -g /path/to/data/originalAssembly/split/scaffold_66.fasta \ -base scaffolds -TMP /tmp/35082116 -cpus 1 I usually run maker with MPI (e.g., mpirun maker -TMP /tmp/abcdefg -cpus 1), but didn't see any need when I was running a single sequence as a test. Note that this output from this isolated run matches what I've been seeing in the mixed output from MPI, just slightly jumbled together with other things. The following is the bottom of the run.log file for this sequence in the datastore: LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark DIED RANK 0:4:0:0 DIED COUNT 1 DIED RANK 0 DIED COUNT 1 The contents of theVoid directory can be viewed at this link: https://byu.box.com/s/hqwngdvehs8dfoymtrkjyismq3p8ayv1. Do you have any suggestions on how I can resolve this error? Thank you, Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpapoli at gmail.com Wed Mar 25 14:38:02 2020 From: hpapoli at gmail.com (Homa Papoli) Date: Wed, 25 Mar 2020 21:38:02 +0100 Subject: [maker-devel] repeatmasker output gff Message-ID: Hello, I have 2 questions regarding user maker: I have used repeatmasker for my genome separately and I have a gff file. However, my gff file, in the third column, has the word "similarity". In a workshop I had taken on genome annotation, it was said that the gff for maker should have "match" and "match_part" for the third column. I was wondering whether I could use the original gff output of repeatmasker or should I make any changes to it? Another question is about running maker. Since maker takes several days to run, if the job gets interrupted due to limit in days of running the job, I was wondering whether it is possible to re-start maker from where it got interrupted? Thank you, Homa -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhao.wei at umu.se Mon Mar 30 03:37:04 2020 From: zhao.wei at umu.se (Wei Zhao) Date: Mon, 30 Mar 2020 09:37:04 +0000 Subject: [maker-devel] Maker annotation AED scores are around 0.5 Message-ID: <1b0e5c3cae1b410397e61262a2384039@umu.se> Dear maker team, I am writing to ask for your help. I am using make to annotate a big genome ~9 Gbp, I have 3 evidences: 1) Transcriptome of this species; 2) protein sequence from relative species; 3) Augustus model trained from pasa. When I use all of these 3 evidences to annotate the genome (basic pipeline), the distribution of AED score is weird (single peak around 0.5). I have also tried to update the gene model I got from pasa using maker, the distribution of AED scores is the same. But when I try to only use EST or protein as evidence (est2genome or protein2genome), the AED scores is normal (close to 0). To my understand, it seems all the 3 evidences are conflict with each other, results in the AED scores is higher (~ 0.5) than expected, could you give me some suggestion on how to fix this problem? Best regards, Wei [cid:image002.png at 01D60687.740255B0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E6F3EF742C40408F8390EE9A1FF29894.png Type: image/png Size: 34543 bytes Desc: E6F3EF742C40408F8390EE9A1FF29894.png URL: From patrick.gagne at canada.ca Tue Mar 31 11:53:13 2020 From: patrick.gagne at canada.ca (=?iso-8859-1?Q?Gagn=E9=2C_Patrick_=28NRCAN/RNCAN=29?=) Date: Tue, 31 Mar 2020 17:53:13 +0000 Subject: [maker-devel] Problem with Maker using GeneMark Message-ID: Hi I've come across a bug while using Maker. I'm trying to annotate a 560Mb Genome and I'm using Snap, GeneMark and Augustus in Maker. When Maker is executing the GeneMark command, it just failed (GeneMark Failed) without any error messages, so I've decided to debug it myself...So I launched every commands manually and found out that the gmhmm_wrap is causing the issue. The problem is in fact in the prebuild command; it doesn't do anything (from what I understand, this command is supposed to split the fasta whre there is NNN to prevent GeneMark Crash). My genome got very long stretches of N (up to 14Kb) After checking the prebuild help, I've found that the command used in gmhmm_wrap is not valid (half the options are not in probuild anymore, probably because of GeneMark updates) I have tried different Probuild (those I could download from GeneMark site, they don't give older versions except those that come with their program's versions) 2.16 2.34 2.44 (lastest that come with GeneMark ES) I've also tried to edit the gmhmm_wrap script and modify the prebuild command, but even when the fasta are splitted, I got another bug : ERROR: Logic error in getting offset. I've tried to replace the command for the offset extraction, which also worked, but now I got a bug when Maker try to get the ab-initio output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! Could you please tell me how to fix this, or tell me what probuild I should use (I will ask the GeneMark support for it) Thanks in advance P.S Sorry for my English, It's not my first language and I'm still learning Patrick Gagn? Sp?cialiste en bio-informatique / Bioinformatics specialist Service canadien des for?ts / Canadian Forest Service Ressources naturelles Canada / Natural Resources Canada Gouvernement du Canada / Government of Canada Centre de foresterie des Laurentides/Laurentian Forestry Centre 1055, rue du P.E.P.S. C.P. 10380, succ. Sainte-Foy/P.O. Box 10380, Stn. Sainte-Foy Qu?bec (Qc) G1V 4C7 Laboratoire de pathologie foresti?re (Local 2.21) patrick.gagne at canada.ca / tel : (418) 648-4443 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 3 12:46:51 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 3 Mar 2020 12:46:51 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: I?m glad you were able to make it work. Thanks, Carson > On Feb 29, 2020, at 10:27 AM, Devon O'Rourke wrote: > > Hi once again Carson, > Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. > Thanks for your assistance, > Devon > > On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke > wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download . > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt > wrote: > For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. > > ?Carson > > >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: >> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. >> >> ?Carson >> >> >> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >>> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >>> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>>

>> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From shore at yorku.ca Fri Mar 6 11:32:48 2020 From: shore at yorku.ca (shore at yorku.ca) Date: Fri, 06 Mar 2020 13:32:48 -0500 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping Message-ID: <1583519568.5e629750e7931@oldmymail.yorku.ca> Hello, I've been attempting to include the iprscan results with my maker gff file. I used the script below to add the iprscan results to my maker gff: ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff I also used the script below to generate a gff of iprscan domains iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff At this point, I wasn't sure how to proceed. I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. And then ran maker2jbrowse, everything seems to work except I get error message "Don't know how to format iprscan tracks, skipping" If I view the file under jbrowse I can certainly see the iprscan results when I click on a transcript, but there are no tracks of iprscan results in the jbrowse. Sorry, this is a bit long winded. I suspect perhaps that concatenating the two GFFs was perhaps not the right way to proceed? Thanks Joel -- Dr. Joel S. Shore Prof. Biology York University From carsonhh at gmail.com Fri Mar 6 12:39:38 2020 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 6 Mar 2020 12:39:38 -0700 Subject: [maker-devel] maker2jbrowse error: Don't know how to format iprscan tracks, skipping In-Reply-To: <1583519568.5e629750e7931@oldmymail.yorku.ca> References: <1583519568.5e629750e7931@oldmymail.yorku.ca> Message-ID: The maker2jbrowse inside maker is just an alias that launches the maker2jbrowse script inside of jbrowse itself (i.e. ?/jbrowse-1.16.8-release/bin/maker2jbrowse). No longer maintained by us, but rather by the jbrowse team. You can edit the maker2jbrowse script yourself to add an ?iprscan' line or any other feature type you want by copying an existing feature in this section (image attached) and renaming values such as ?blastn? to be ?iprscan? (these are the command line options that get sent to flatfile-to-json.pl just as if you were runninging it manually) ?> For '--type?, I believe ?iprscan? uses ?match? in the GFF3 column, so instead of ?protein_match? or ?expressed_sequence_match?, just trim it to ?match? in the maker2jbrowse section as well. You also must edit the ?/jbrowse/css/maker.scss file to choose what colors you want the feature display to have. Similar to the example above, just copy an existing feature and make a new one where you replace names like ?blastn' with ?iprscan? (image attached) ?> ?Carson > On Mar 6, 2020, at 11:32 AM, shore at yorku.ca wrote: > > Hello, > > I've been attempting to include the iprscan results with my maker gff file. > > I used the script below to add the iprscan results to my maker gff: > > ipr_update_gff maker.gff iprscan.tsv > makeriprscan.gff > > I also used the script below to generate a gff of iprscan domains > > iprscan2gff3 iprscan.tsv maker.gff > iprscandomain.gff > > At this point, I wasn't sure how to proceed. > I concatenated the iprscandomain.gff to the end of the makeriprscan.gff. > > And then ran maker2jbrowse, everything seems to work except I get error message > > "Don't know how to format iprscan tracks, skipping" > > If I view the file under jbrowse I can certainly see the iprscan results when I > click on a transcript, but there are no tracks of iprscan results in the > jbrowse. > > Sorry, this is a bit long winded. > > I suspect perhaps that concatenating the two GFFs was perhaps not the right way > to proceed? > > Thanks > Joel > -- > Dr. Joel S. Shore > Prof. Biology > York University > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 365893 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-4.png Type: image/png Size: 77391 bytes Desc: not available URL: From devon.orourke at gmail.com Mon Mar 9 07:24:11 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Mon, 9 Mar 2020 09:24:11 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file Message-ID: Hi Carson, I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might *also* be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum ( https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets *close* but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds *do finish*!). The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: ``` DIED RANK 4:6:0:4 DIED RANK 5:6:0:4 DIED RANK 6:6:0:53 ``` The second error is always: ``` DIED COUNT 1 ``` You can view the .log file here: https://osf.io/4wn6h/download. I've attached the .opts file to this message. Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish *would* finish otherwise? Thank you once more for your help with troubleshooting, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: makerRun2_opts.ctl Type: application/octet-stream Size: 3855 bytes Desc: not available URL: From christopher.keeling.1 at ulaval.ca Sat Mar 14 11:24:41 2020 From: christopher.keeling.1 at ulaval.ca (Christopher Keeling) Date: Sat, 14 Mar 2020 17:24:41 +0000 Subject: [maker-devel] Maker 2.31.10: maker_functional_gff and maker_functional_fasta not parsing correctly, Can't use string ("") as a HASH ref while "strict refs" in use Message-ID: <4A449297-A6D1-4A75-9547-FB5F70CE1A0A@ulaval.ca> An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 16 10:48:13 2020 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 16 Mar 2020 10:48:13 -0600 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: References: Message-ID: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> In the log I see these ?> ERROR: Could not open file: /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp Cannot send after transport endpoint shutdown You are having IO timeout issues. It can be an issue with a single node on your cluster, or in issue with the object storage servers on the lustre network storage. Or your job may just be too big for the network storage to handle. You will likely need to run on fewer nodes or you can have your system admin increase timeout options for Lustre to see if that helps. ?Carson > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to your expert advise on resetting MPI parameters. Because earlier tests (prior to this successful run) indicated that other dependency programs might also be contributing to failed Maker jobs, this first successful run consisted entirely of GFF data as input for the est, altest, and protein evidence, as well as using a custom rm_gff file for complex repeats (I was following the strategy posted in an earlier thread in this forum (https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM ). > > The good news is that using GFF files only will get the job to finish, the bad news is that if I try to input the original fasta files instead of the resulting GFF's for the evidence data, Maker gets close but fails to finish the job at the stage where (I think) the per-scaffold chunks of "evidence_*.gff", "scaffold*.*.pred.raw.section", and "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" files. The behavior is not entirely consistent across all scaffolds: most scaffolds in fact produce finished files (the "scaffold*.gff", "transcripts.fasta", etc.), however the majority of the failed scaffolds are the longest ones (though at least a handful of longer scaffolds do finish!). > > The initial error in the run.log.child.* files in these failed scaffolds aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download . I've attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not optimized for these jobs. I could certainly re-run the same data through a machine without MPI at this point because all the jobs are basically completed (no more blasting or repeat masking is needed). Thus I think the question is - should I just restart the run without MPI and see if it finishes? Or perhaps, there are alternative Maker scripts to try testing directly (even on a single scaffold subdirectory) to see if these instances where Maker doesn't quite finish would finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Tue Mar 17 04:21:58 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Tue, 17 Mar 2020 06:21:58 -0400 Subject: [maker-devel] Maker actions when using importied rm_gff file In-Reply-To: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> References: <32D05882-857D-41FD-BAAD-3A4864F9D4EB@gmail.com> Message-ID: Ah, Thanks so much Carson. The issue ended up being that our sysadmin installed Perl modules that were of a version that was incompatible with the version of Perl running with Maker. Once I installed a virtual environment with the appropriate Perl and Perl modules that were happy to work together these errors went away. Thanks again! Devon On Mon, Mar 16, 2020 at 12:48 PM Carson Holt wrote: > In the log I see these ?> > > ERROR: Could not open file: > /scratch/dro49/myluwork/annotation/test9/lu.maker.output/lu_datastore/DC/34/scaffold1731//theVoid.scaffold1731/scaffold1731.gff.seq.tmp > Cannot send after transport endpoint shutdown > > You are having IO timeout issues. It can be an issue with a single node > on your cluster, or in issue with the object storage servers on the lustre > network storage. Or your job may just be too big for the network storage > to handle. You will likely need to run on fewer nodes or you can have your > system admin increase timeout options for Lustre to see if that helps. > > ?Carson > > > > On Mar 9, 2020, at 7:24 AM, Devon O'Rourke > wrote: > > Hi Carson, > > I recently completed one round of Maker annotation successfully thanks to > your expert advise on resetting MPI parameters. Because earlier tests > (prior to this successful run) indicated that other dependency programs > might *also* be contributing to failed Maker jobs, this first successful > run consisted entirely of GFF data as input for the est, altest, and > protein evidence, as well as using a custom rm_gff file for complex repeats > (I was following the strategy posted in an earlier thread in this forum ( > https://groups.google.com/forum/#!topic/maker-devel/patU-l_TQUM). > > The good news is that using GFF files only will get the job to finish, the > bad news is that if I try to input the original fasta files instead of the > resulting GFF's for the evidence data, Maker gets *close* but fails to > finish the job at the stage where (I think) the per-scaffold chunks of > "evidence_*.gff", "scaffold*.*.pred.raw.section", and > "scaffold*.*.final.section" is collapsed into a set of "scaffold*.gff", > "scaffold*.maker.transcripts.fasta" and "scaffold*.maker.proteins.fasta" > files. The behavior is not entirely consistent across all scaffolds: most > scaffolds in fact produce finished files (the "scaffold*.gff", > "transcripts.fasta", etc.), however the majority of the failed scaffolds > are the longest ones (though at least a handful of longer scaffolds *do > finish*!). > > The initial error in the run.log.child.* files in these failed scaffolds > aren't always the same. Here's a few: > > ``` > DIED RANK 4:6:0:4 > DIED RANK 5:6:0:4 > DIED RANK 6:6:0:53 > ``` > > The second error is always: > ``` > DIED COUNT 1 > ``` > > You can view the .log file here: https://osf.io/4wn6h/download. I've > attached the .opts file to this message. > > Maybe again there is something about our MPI parameters that are not > optimized for these jobs. I could certainly re-run the same data through a > machine without MPI at this point because all the jobs are basically > completed (no more blasting or repeat masking is needed). Thus I think the > question is - should I just restart the run without MPI and see if it > finishes? Or perhaps, there are alternative Maker scripts to try testing > directly (even on a single scaffold subdirectory) to see if these instances > where Maker doesn't quite finish *would* finish otherwise? > > Thank you once more for your help with troubleshooting, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Mar 20 05:30:56 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 20 Mar 2020 07:30:56 -0400 Subject: [maker-devel] guidance for first and subsequent annotation parameters Message-ID: With so many posts on the forum it's been challenging to determine what the best practices are for performing multiple rounds of annotation with Maker. My first round used est, altest, and protein fasta files with a custom GFF repeat masked file. The resulting vertebrate genome produced 21,970 gene models with a mean length of about 9016 bp; the BUSCO score was C:66.0%[S:64.2%,D:1.8%],F:4.2%,M:29.8%,n:9226 (mammalia_odb10 set). Things seemed to be on the right track, so I set up the next Maker round using both SNAP and Augustus-trained information in the round2 maker_opts.ctl file. At the end of that second round, I noticed a marked *decrease* in BUSCO score (C:53.3%[S:51.0%,D:2.3%],F:11.6%,M:35.1%,n:9226), yet an increase in the number of gene models (28,646) and mean length (16266 bp). This got me to wondering if I was setting up the _opts.ctl file incorrectly? I'm concerned with a few things (and maybe missing even more I should be concerned about!?): - I specified the evidence to come from EST/Protein instead of using the section available under "#-----Re-annotation Using MAKER Derived GFF3". Maybe that was a fundamental mistake? What is the expected change in behavior if I moved my round1 Maker output into that category instead of using the EST/Protein Homology evidence sections as I did below? - I wasn't sure what to do with the RepeatMasking GFF files in Round2. The RepeatMasker GFF I included in Round1 consisted of just complex repeats (setting model_org=simple and softmask=1 to effectively only hard mask those complex areas for the initial alignments). But what should be used in Round2 - the output GFF of Round1, or the input GFF from Round1? Here's what I did for the Round2 maker_opts.ctl file: #-----Genome (these are always required) genome=/scratch/dro49/myluwork/annotation/input_files/mylu_hic_rails_noMasks.fa organism_type=eukaryotic #-----EST Evidence (for best results provide a file for at least one) est_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.est2genome.gff altest_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.cdna2genome.gff #-----Protein Homology Evidence (for best results provide a file for at least one) protein_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.protein2genome.gff #-----Repeat Masking (leave values blank to skip repeat masking) rm_gff=/scratch/dro49/myluwork/annotation/maker_rd2/mylu_rnd1.all.maker.repeats.gff prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm=/scratch/dro49/myluwork/annotation/maker_rd2/snap_rd1/lu_rnd1.zff.length50_aed0.25.hmm #SNAP HMM file augustus_species=mylu #Augustus gene prediction species model run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default) Thank you for your insights and support, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From eennadi at gmail.com Tue Mar 17 21:22:45 2020 From: eennadi at gmail.com (Emmanuel Nnadi) Date: Wed, 18 Mar 2020 04:22:45 +0100 Subject: [maker-devel] CRL_Step2 will not produce required outputs Message-ID: Please I am trying to to follow this tutorial http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced . Running this step perl DIR_CRL/CRL_Step2.pl --step1 CRL_Step1_Passed_Elements.txt --repeatfile seqfile.out99 --resultfile seqfile.result99 \ --sequencefile seqfile --removed_repeats CRL_Step2_Passed_Elements.fasta the expected output ought to be CRL_Step2_Passed_Elements.fasta Repeat_*.fasta files But am only getting CRL_Step2_Passed_Elements.fasta with no Repeat_*.fasta files Please what could be the problem? Nnadi Nnaemeka Emmanuel,Ph.D Department of Microbiology, Faculty of Natural and Applied Science, Plateau State University, Bokkos, Plateau State, Nigeria. +2348068124819 Publications: https://www.researchgate.net/profile/Emmanuel_Nnadi/publications -------------- next part -------------- An HTML attachment was scrubbed... URL: From luca.peruzza at unipd.it Mon Mar 23 12:30:23 2020 From: luca.peruzza at unipd.it (Luca Peruzza) Date: Mon, 23 Mar 2020 19:30:23 +0100 Subject: [maker-devel] maker_functional_gff error "Can't use string ("") as a HASH ref" Message-ID: Hi all, I am using the ?maker_functional_gff? script to update my gff3 by adding functional annotation from blast against uniprot, however I do get the following error when running the code: Can't use string ("") as a HASH ref while "strict refs" in use at /opt/maker/bin/maker_functional_gff line 55, <$IN> line 277933. I checked the line and it appears like the other lines in the gff3 file so I was wondering if you guys know what is causing the error? The ?offending? line and the following one are: tig00632243??? maker? gene??? 2926??? 22617? .?????????? -?????????? .??????????? ID=Gacu_00045928;Name=Gacu_00045928;Alias=maker-tig00632243-snap-gene-0.9;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; tig00632243??? maker? mRNA? 2926??? 22617? .?????????? -?????????? .?????????? ID=Gacu_00045928-RA;Parent=Gacu_00045928;Name=Gacu_00045928-RA;Alias=maker-tig00632243-snap-gene-0.9-mRNA-1;_AED=0.08;_QI=0|0|0|0.75|0.81|0.83|12|0|714;_eAED=0.34;Dbxref=MobiDBLite:mobidb-lite,PANTHER:PTHR15288,PANTHER:PTHR15288:SF5; Thanks Luca -------------- next part -------------- An HTML attachment was scrubbed... URL: From pickettbd at gmail.com Tue Mar 24 17:12:01 2020 From: pickettbd at gmail.com (Brandon Pickett) Date: Tue, 24 Mar 2020 16:12:01 -0700 Subject: [maker-devel] substr outside of string at .../Carp.pm 346 Message-ID: I completed the first round of Maker. I subsequently trained Snap, Genemark-es, and Augustus. I've since fed those results back into Maker for a second round. Some sequences were successful, others were not. On some, I encountered an error about calling translate without a seq argument. I read some other threads about similar issues, and I followed the advice to isolate a single sequence using -g and -base. My config files can be found at this link: https://byu.box.com/s/1tbp48djblo31ruuy8zm62k1vxoyozhq. The following are the contents of stderr: STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold_66 Length: 2264627 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. doing repeat masking collecting blastx repeatmasking processing all repeats in cluster::shadow_cluster... ...finished clustering. preparing masked sequence preparing ab-inits running snap. #--------- command -------------# Widget::snap: /path/to/snap /path/to/data/snap/rnd1/genome.hmm /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.genome%2Ehmm.snap #-------------------------------# scoring....decoding.10.20.30.40.50.60.70.80.90.100 done scoring....decoding.10.20.30.40.50.60.70.80.90.100 done running augustus. #--------- command -------------# Widget::augustus: /path/to/apps/augustus/3.3.2/final/bin/augustus --AUGUSTUS_CONFIG_PATH=/path/to/data/augustus_config --species=pacbf --UTR=off /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0 > /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_masked.0.pacbf.augustus #-------------------------------# running genemark. #--------- command -------------# Widget::genemark: /path/to/apps/perl/5.28/perl/bin/perl /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/genemark/gmhmm_wrap -m /path/to/data/gmes/output/gmhmm.mod -g /path/to/apps/genemark-es/4.38/gmhmme3 -p /path/to/apps/genemark-es/4.38/probuild -o /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark /tmp/35082116/maker_0v5GGB/scaffold_66.abinit_nomask.0 #-------------------------------# gathering ab-init output files deleted:0 genes deleted:0 genes substr outside of string at /path/to/apps/perl/5.28/perl/lib/5.28.0/Carp.pm line 346. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! STACK: Error::throw STACK: Bio::Root::Root::throw /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Root/Root.pm:447 STACK: Bio::Tools::CodonTable::translate /path/to/apps/bioperl/1.7.2/perl5/lib/perl5/Bio/Tools/CodonTable.pm:419 STACK: CGL::TranslationMachine::longest_translation_plus_stop /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/CGL/TranslationMachine.pm:280 STACK: maker::auto_annotator::get_translation_seq /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/maker/ auto_annotator.pm:3575 STACK: Widget::snap::load_phat_hits /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:973 STACK: Widget::snap::parse /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Widget/ snap.pm:689 STACK: GI::parse_abinit_file /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/GI.pm:1228 STACK: Process::MpiChunk::_go /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:1473 STACK: Process::MpiChunk::run /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:340 STACK: Process::MpiChunk::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiChunk.pm:356 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: Process::MpiTiers::run_all /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/../lib/Process/MpiTiers.pm:287 STACK: /path/to/apps/maker/3.01.02-beta/gcc-8.3.0_mpich-3.3.1_perl-5.28.0/bin/maker:679 ----------------------------------------------------------- --> rank=NA, hostname=somenode.rc.byu.edu ERROR: Failed while gathering ab-init output files ERROR: Chunk failed at level:1, tier_type:2 FAILED CONTIG:scaffold_66 ERROR: Chunk failed at level:4, tier_type:0 FAILED CONTIG:scaffold_66 examining contents of the fasta file and run log --Next Contig-- Processing run.log file... Maker is now finished!!! The command I ran was as follows: maker -g /path/to/data/originalAssembly/split/scaffold_66.fasta \ -base scaffolds -TMP /tmp/35082116 -cpus 1 I usually run maker with MPI (e.g., mpirun maker -TMP /tmp/abcdefg -cpus 1), but didn't see any need when I was running a single sequence as a test. Note that this output from this isolated run matches what I've been seeing in the mixed output from MPI, just slightly jumbled together with other things. The following is the bottom of the run.log file for this sequence in the datastore: LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 LOGCHILD /path/to/data/maker/rnd2/scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/run.log.child.0 STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.genome%2Ehmm.snap STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_masked.0.pacbf.augustus STARTED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark FINISHED scaffolds.maker.output/scaffolds_datastore/92/0F/scaffold_66//theVoid.scaffold_66/scaffold_66.abinit_nomask.0.gmhmm%2Emod.genemark DIED RANK 0:4:0:0 DIED COUNT 1 DIED RANK 0 DIED COUNT 1 The contents of theVoid directory can be viewed at this link: https://byu.box.com/s/hqwngdvehs8dfoymtrkjyismq3p8ayv1. Do you have any suggestions on how I can resolve this error? Thank you, Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpapoli at gmail.com Wed Mar 25 14:38:02 2020 From: hpapoli at gmail.com (Homa Papoli) Date: Wed, 25 Mar 2020 21:38:02 +0100 Subject: [maker-devel] repeatmasker output gff Message-ID: Hello, I have 2 questions regarding user maker: I have used repeatmasker for my genome separately and I have a gff file. However, my gff file, in the third column, has the word "similarity". In a workshop I had taken on genome annotation, it was said that the gff for maker should have "match" and "match_part" for the third column. I was wondering whether I could use the original gff output of repeatmasker or should I make any changes to it? Another question is about running maker. Since maker takes several days to run, if the job gets interrupted due to limit in days of running the job, I was wondering whether it is possible to re-start maker from where it got interrupted? Thank you, Homa -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhao.wei at umu.se Mon Mar 30 03:37:04 2020 From: zhao.wei at umu.se (Wei Zhao) Date: Mon, 30 Mar 2020 09:37:04 +0000 Subject: [maker-devel] Maker annotation AED scores are around 0.5 Message-ID: <1b0e5c3cae1b410397e61262a2384039@umu.se> Dear maker team, I am writing to ask for your help. I am using make to annotate a big genome ~9 Gbp, I have 3 evidences: 1) Transcriptome of this species; 2) protein sequence from relative species; 3) Augustus model trained from pasa. When I use all of these 3 evidences to annotate the genome (basic pipeline), the distribution of AED score is weird (single peak around 0.5). I have also tried to update the gene model I got from pasa using maker, the distribution of AED scores is the same. But when I try to only use EST or protein as evidence (est2genome or protein2genome), the AED scores is normal (close to 0). To my understand, it seems all the 3 evidences are conflict with each other, results in the AED scores is higher (~ 0.5) than expected, could you give me some suggestion on how to fix this problem? Best regards, Wei [cid:image002.png at 01D60687.740255B0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: E6F3EF742C40408F8390EE9A1FF29894.png Type: image/png Size: 34543 bytes Desc: E6F3EF742C40408F8390EE9A1FF29894.png URL: From patrick.gagne at canada.ca Tue Mar 31 11:53:13 2020 From: patrick.gagne at canada.ca (=?iso-8859-1?Q?Gagn=E9=2C_Patrick_=28NRCAN/RNCAN=29?=) Date: Tue, 31 Mar 2020 17:53:13 +0000 Subject: [maker-devel] Problem with Maker using GeneMark Message-ID: Hi I've come across a bug while using Maker. I'm trying to annotate a 560Mb Genome and I'm using Snap, GeneMark and Augustus in Maker. When Maker is executing the GeneMark command, it just failed (GeneMark Failed) without any error messages, so I've decided to debug it myself...So I launched every commands manually and found out that the gmhmm_wrap is causing the issue. The problem is in fact in the prebuild command; it doesn't do anything (from what I understand, this command is supposed to split the fasta whre there is NNN to prevent GeneMark Crash). My genome got very long stretches of N (up to 14Kb) After checking the prebuild help, I've found that the command used in gmhmm_wrap is not valid (half the options are not in probuild anymore, probably because of GeneMark updates) I have tried different Probuild (those I could download from GeneMark site, they don't give older versions except those that come with their program's versions) 2.16 2.34 2.44 (lastest that come with GeneMark ES) I've also tried to edit the gmhmm_wrap script and modify the prebuild command, but even when the fasta are splitted, I got another bug : ERROR: Logic error in getting offset. I've tried to replace the command for the offset extraction, which also worked, but now I got a bug when Maker try to get the ab-initio output : ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Calling translate without a seq argument! Could you please tell me how to fix this, or tell me what probuild I should use (I will ask the GeneMark support for it) Thanks in advance P.S Sorry for my English, It's not my first language and I'm still learning Patrick Gagn? Sp?cialiste en bio-informatique / Bioinformatics specialist Service canadien des for?ts / Canadian Forest Service Ressources naturelles Canada / Natural Resources Canada Gouvernement du Canada / Government of Canada Centre de foresterie des Laurentides/Laurentian Forestry Centre 1055, rue du P.E.P.S. C.P. 10380, succ. Sainte-Foy/P.O. Box 10380, Stn. Sainte-Foy Qu?bec (Qc) G1V 4C7 Laboratoire de pathologie foresti?re (Local 2.21) patrick.gagne at canada.ca / tel : (418) 648-4443 -------------- next part -------------- An HTML attachment was scrubbed... URL: