From carsonhh at gmail.com Tue Feb 4 17:27:47 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:27:47 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: References: Message-ID: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. ?Carson > On Jan 31, 2020, at 2:50 PM, Emily Abernathy wrote: > > Hello, > I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: > > I am using a genome that I assembled in Supernova v2 with headers that resemble this: > >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 > > and I downloaded two fasta files from ENSEMBL whose headers resemble this: > >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > and > > >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! > > Thanks in advance, > E. Abernathy > > > > -- > Emily Abernathy > Graduate Group in Ecology > University of California, Davis > http://hulllabucd.wix.com/hulllab _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:34:10 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:34:10 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> References: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Message-ID: <910B07A7-780E-4A3B-B8E3-5874FDF14087@gmail.com> Also update Bioperl to 1.7.4. ?Carson > On Feb 4, 2020, at 5:27 PM, Carson Holt wrote: > > Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. > > ?Carson > > > >> On Jan 31, 2020, at 2:50 PM, Emily Abernathy > wrote: >> >> Hello, >> I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: >> >> I am using a genome that I assembled in Supernova v2 with headers that resemble this: >> >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 >> >> and I downloaded two fasta files from ENSEMBL whose headers resemble this: >> >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> and >> >> >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! >> >> Thanks in advance, >> E. Abernathy >> >> >> >> -- >> Emily Abernathy >> Graduate Group in Ecology >> University of California, Davis >> http://hulllabucd.wix.com/hulllab _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:38:05 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:38:05 -0700 Subject: [maker-devel] Avoiding re-indexing the same file In-Reply-To: References: Message-ID: <032EA515-1EAC-4374-9B8B-51D6ECC39B27@gmail.com> MAKER only indexes the input files during the first run. It will reuse the indexes after that. The indexes are in the *.maker.output.mpi_blastdb directory. If this is a repeatmasker issue, it keeps it?s indexes under the ?/RepeatMasker/Libraries/ directory and reuses them after indexing the first time. ?Carson > On Jan 29, 2020, at 7:42 AM, H.DENISE wrote: > > Hi, > I?m new to Maker and need to compare the annotations with different features (+/- RepeatMasker, using different protein files etc ?). However the first step seems to be the indexing of my files and the RNASeq file I?m using is large, therefore Maker seems to take ages at this step,. As it is a constant file for my applications, is there a way to provide the indexing file in order to avoid repeating this step? > Thanks in advance, Hubert > > > > Hubert DENISE, PhD > > Genome Data Analyst > R.Durbin's group > Department of Genetics > University of Cambridge > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 9 04:02:27 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 13:02:27 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: files.rar Type: application/octet-stream Size: 5380 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Feb 9 03:24:09 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 12:24:09 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation.gff Type: application/octet-stream Size: 2514 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation_maker_opts.ctl Type: application/octet-stream Size: 5441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover.gff Type: application/octet-stream Size: 16168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover_maker_opts.ctl Type: application/octet-stream Size: 4643 bytes Desc: not available URL: From mbreitbach at hudsonalpha.org Tue Feb 11 09:12:23 2020 From: mbreitbach at hudsonalpha.org (Megan Breitbach) Date: Tue, 11 Feb 2020 10:12:23 -0600 Subject: [maker-devel] Maker Issue re-annotating Message-ID: Good morning, I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of From devon.orourke at gmail.com Wed Feb 19 13:54:28 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 19 Feb 2020 15:54:28 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail Message-ID: Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: ``` DIED RANK 0:6:0:0 DIED COUNT 2 ``` (the master 'run.log' file just shows "DIED COUNT 2") I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): ``` 0 evidence_0.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 scaffold22.0.final.section scaffold22.0.pred.raw.section scaffold22.0.raw.section scaffold22.gff.ann scaffold22.gff.def scaffold22.gff.seq ``` For the completed scaffold, there are many more files created: ``` 0 10 100 20 30 40 50 60 70 80 90 evidence_0.gff evidence_10.gff evidence_1.gff evidence_2.gff evidence_3.gff evidence_4.gff evidence_5.gff evidence_6.gff evidence_7.gff evidence_8.gff evidence_9.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 run.log.child.1 run.log.child.10 run.log.child.2 run.log.child.3 run.log.child.4 run.log.child.5 run.log.child.6 run.log.child.7 run.log.child.8 run.log.child.9 scaffold4.0-1.raw.section scaffold4.0.final.section scaffold4.0.pred.raw.section scaffold4.0.raw.section scaffold4.10.final.section scaffold4.10.pred.raw.section scaffold4.10.raw.section scaffold4.1-2.raw.section scaffold4.1.final.section scaffold4.1.pred.raw.section scaffold4.1.raw.section scaffold4.2-3.raw.section scaffold4.2.final.section scaffold4.2.pred.raw.section scaffold4.2.raw.section scaffold4.3-4.raw.section scaffold4.3.final.section scaffold4.3.pred.raw.section scaffold4.3.raw.section scaffold4.4-5.raw.section scaffold4.4.final.section scaffold4.4.pred.raw.section scaffold4.4.raw.section scaffold4.5-6.raw.section scaffold4.5.final.section scaffold4.5.pred.raw.section scaffold4.5.raw.section scaffold4.6-7.raw.section scaffold4.6.final.section scaffold4.6.pred.raw.section scaffold4.6.raw.section scaffold4.7-8.raw.section scaffold4.7.final.section scaffold4.7.pred.raw.section scaffold4.7.raw.section scaffold4.8-9.raw.section scaffold4.8.final.section scaffold4.8.pred.raw.section scaffold4.8.raw.section scaffold4.9-10.raw.section scaffold4.9.final.section scaffold4.9.pred.raw.section scaffold4.9.raw.section ``` Thanks for any troubleshooting tips you can offer. Cheers, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From tayab.soomro at canada.ca Thu Feb 20 14:42:24 2020 From: tayab.soomro at canada.ca (Soomro, Tayab (AAFC/AAC)) Date: Thu, 20 Feb 2020 21:42:24 +0000 Subject: [maker-devel] Unassembled RNA-Seq data to Maker Message-ID: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. From jason.stajich at gmail.com Thu Feb 20 14:53:14 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 20 Feb 2020 13:53:14 -0800 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: <0169feea-4c2c-4376-a27f-fab33fa5aa0f@Spark> It uses a transcript alignment approach (blast and exonerate) which are optimized for long est to Genome alignments. You can build transcripts first by running trinity to assemble the RNAseq reads. On Feb 20, 2020, 1:42 PM -0800, Soomro, Tayab (AAFC/AAC) , wrote: > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Feb 20 19:16:10 2020 From: scott at scottcain.net (Scott Cain) Date: Thu, 20 Feb 2020 18:16:10 -0800 Subject: [maker-devel] GMOD in Google Summer of Code Message-ID: Hello, I am very pleased to announce that GMOD in conjunction with Reactome, Galaxy and OICR/WormBase, together forming Open Genome Informatics, has been accepted for the Google Summer of Code. If you or someone you know might be a student interested in participating in GSoC, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2020 where there are proposed projects that cover a fair number of technologies. Official proposals from students will be due in mid March (more on that later). But WAIT! There's more: if you might be interested in being a mentor and working with a student this summer, it's not too late! You can add new project ideas to the page above (contact me if you need an account), or you can even volunteer to add yourself to one of the existing ideas as a potential mentor. Please feel free to forward this to other mailing lists or people who might be interested. We are already an eclectic, dispersed group, so everyone is welcome. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:05:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:05:31 -0700 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: MAKER does not assemble the reads. It uses BLAST to align a sequence and then exonerate to polish around splice sites. This allows identification of introns (exons aren?t as useful for gene prediction hints). Unassembled reads will more likely align spuriously, will not cross splice sites (unless for intron identification), and will not be assigned to the proper strand (intron aware alignments allow proper strand assignment). MAKER was developed when older EST technology was the only option, mRNA-seq can be treated the same if it is assembled first. ?Carson > On Feb 20, 2020, at 2:42 PM, Soomro, Tayab (AAFC/AAC) wrote: > > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 12:09:58 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:09:58 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: Message-ID: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. ?Carson > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. > > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? > > In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). > However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:10:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:10:59 -0700 Subject: [maker-devel] Maker Issue re-annotating In-Reply-To: References: Message-ID: <0546CBA9-9EB4-45B0-BB02-888E2F1B8AA9@gmail.com> Sorry for the slow reply. Please capture and send the STDERR from one of the failures. ?Carson > On Feb 11, 2020, at 9:12 AM, Megan Breitbach wrote: > > Good morning, > > I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of > Here are the parameters used in the maker_opts.ctl file- > > #-----Genome (these are always required) > genome=blackbear_DNAzoo.FINAL.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=blackbear_DNAzoo.FINAL.all.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=Ursus_maritimus.UrsMar_1.0.cdna.all.fa #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Ursus_maritimus.UrsMar_1.0.pep.all.fa #protein sequence file in fasta format (i.e. from mutiple organisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=blackbear.hmm #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > allow_overlap=0 #allowed gene overlap fraction (value from 0 to 1, blank for default) > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > min_intron=20 #minimum intron length (used for alignment polishing) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > Thanks, > -- > Megan Ramaker, PhD > Postdoctoral Trainee > HudsonAlpha Institute for Biotechnology > 601 Genome Way > Huntsville, AL 35806 > 478-284-6723 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:19:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:19:59 -0700 Subject: [maker-devel] Alternative splicing in MAKER In-Reply-To: References: Message-ID: est2genome=1 together with alt_splice=1 can cause weird behavior, because est2genome is just a cut and paste of an alignemnt to being a gene model, it will always be 100% supported by the evidence (itself as an alignment), and anything that overlaps will be clustered to being the same gene which can be messy if models you are moving forward align to multiple locations. You can add est_forward=1 (manually add it, it?s undocumented) to maker_opts.ctl to get MAKER to do a few extra behaviors. It will keep the names from the est2genome alignments (not rename them to maker names), and if you add hints like gene_id= to the fasta header it will only cluster things with the same gene ID and not just cluster by overlap. Also you can add maker_coor= to the header to restrict alignments to specific contigs or even contig regions. ?Carson > On Feb 9, 2020, at 3:24 AM, Lior Glick wrote: > > Hello, > I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. > In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. > > In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. > The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. > In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. > Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). > > Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? > > Thanks a lot! > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:27:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:27:43 -0700 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: Message-ID: Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson > On Dec 18, 2019, at 7:19 AM, Patrick Tran Van wrote: > > Hi Carson, > > I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! > > Scaffold maker > mRNA 12117462 > 12128433 . > - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; > Scaffold maker > exon 12128112 > 12128433 . > - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; > Scaffold maker > exon 12117462 > 12118046 . > - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118141 > 12118301 . > - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118386 > 12118539 . > - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12118818 > 12122493 . > - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12123591 > 12123893 . > - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12123995 > 12124303 . > - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12125119 > 12125418 . > - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12126005 > 12126313 . > - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12127460 > 12127687 . > - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > five_prime_UTR 12128112 > 12128433 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12127460 > 12127687 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12126005 > 12126313 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12125119 > 12125418 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123995 > 12124303 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123591 > 12123893 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12118882 > 12122493 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118818 > 12118881 . > - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118386 > 12118539 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118141 > 12118301 . > - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12117709 > 12118046 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > three_prime_UTR 12117462 > 12117708 . > - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; > > > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:54:32 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:54:32 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). Example ?> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. ?Carson > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. > I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: > If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >> >> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >> >> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Wed Feb 26 12:36:25 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 14:36:25 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: Thanks very much for the reply Carson, I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. Appreciate your insights; hope the weather in UT is filled with sun or snow or both. Devon On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > If running under MPI, the reason for a failure may be further back in the > STDERR (failures tend snowball other failures, so the initial cause is > often way back). If you can capture the STDERR and send it, that would be > the most informative. If its memory, you can also set all the blast_depth > parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears > that the option to enter new posts is disabled. Perhaps this is by design > so emails go directly to this address. I hope this is what you are looking > for. > > Thank you for your continued support of Maker and your responses to the > forum posts. I have been running Maker (V3.01.02-beta) to annotate a > mammalian genome that consists of 22 chromosome-length scaffolds (between > ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. > In my various tests in running Maker, the vast majority of the smaller > fragments are annotated successfully, but nearly all the large scaffolds > fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was > hoping you might be able to help me determine where I should start > troubleshooting. I thought perhaps it was an error concerning memory > requirements, so I altered the chunk size from the default to a few larger > sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the > same outcome). I've tried running the program with parallel support using > either openMPI or mpich. I've tried running on a single node using 24 cpus > and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces > the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but > the other 21 of 22 large scaffolds fail. This makes me think perhaps it's > not a memory issue? > > In the case of both the completed and failed scaffolds, the > "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, > .specific.ori.out, .specific.cat.gz, .specific.out, > te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest > *fasta.tblastx, and protein *fasta.blastx files are all present (and appear > finished from what I can tell). > However, the particular contents in the parent directory to the > "theVoid.scaffold" folder differ. For the failed scaffolds, the contents > generally always look something like this (that is, they stall with the > same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1a.log.gz Type: application/x-gzip Size: 21751 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1b.log.gz Type: application/x-gzip Size: 2175 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_maker_opts.ctl Type: application/octet-stream Size: 3719 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_slurm.sh Type: application/x-sh Size: 787 bytes Desc: not available URL: From devon.orourke at gmail.com Wed Feb 26 13:15:08 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 15:15:08 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: Much appreciated Carson, I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). Thanks, Devon On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > Try adding these a few options right after ?mpiexec? in your batch script > (this will fix infiniband related segfaults as well as some fork related > segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for > subprocesses in the STDERR (allows you to run some commands outside of > MAKER to test the source of failures if for example BLASt or Exonerate is > causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu > -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few > systems where it has an issue with perl itself, and the only way to get > around it is to install your own version of perl without perl threads > enabled and install MAKER with that version of Perl (then OpenMPI seems to > be ok again). If that?s the case it is often easier to switch to MPICH2 or > Intel MPI as the MPI launcher if they are available and then reinstall > MAKER with that MPI flavor. > > ?Carson > > > > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell > script submitted to Slurm, the _opts.ctl file, and the pair of log files > generated from the job. The reason there are a 1a and 1b pair of files is > that I had initially set the number of cpus in the _opts.ctl file to "60", > but then tried re-running it after setting it to "28". Both seem to have > the same result. > I certainly have access to more memory if needed. I'm using a pretty > typical (I think?) cluster that controls jobs with Slurm using a Lustre > file system - it's the main high performance computing center at our > university. I have access to plenty of nodes that contain about 120-150g of > RAM each with between 24-28 cpus each, as well a handful of higher memory > nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a > similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over > 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or > snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > >> If running under MPI, the reason for a failure may be further back in the >> STDERR (failures tend snowball other failures, so the initial cause is >> often way back). If you can capture the STDERR and send it, that would be >> the most informative. If its memory, you can also set all the blast_depth >> parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >> wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears >> that the option to enter new posts is disabled. Perhaps this is by design >> so emails go directly to this address. I hope this is what you are looking >> for. >> >> Thank you for your continued support of Maker and your responses to the >> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >> mammalian genome that consists of 22 chromosome-length scaffolds (between >> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >> In my various tests in running Maker, the vast majority of the smaller >> fragments are annotated successfully, but nearly all the large scaffolds >> fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was >> hoping you might be able to help me determine where I should start >> troubleshooting. I thought perhaps it was an error concerning memory >> requirements, so I altered the chunk size from the default to a few larger >> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >> same outcome). I've tried running the program with parallel support using >> either openMPI or mpich. I've tried running on a single node using 24 cpus >> and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces >> the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but >> the other 21 of 22 large scaffolds fail. This makes me think perhaps it's >> not a memory issue? >> >> In the case of both the completed and failed scaffolds, the >> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >> .specific.ori.out, .specific.cat.gz, .specific.out, >> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >> finished from what I can tell). >> However, the particular contents in the parent directory to the >> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >> generally always look something like this (that is, they stall with the >> same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 13:18:34 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 13:18:34 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. ?Carson > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: > Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. > > ?Carson > > > >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>> >>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Feb 28 05:50:27 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 28 Feb 2020 07:50:27 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download. Thanks! (previous message below) .... Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Sat Feb 29 10:27:16 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Sat, 29 Feb 2020 12:27:16 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi once again Carson, Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. Thanks for your assistance, Devon On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a > notification about the text body size being too large. I thought perhaps it > was related to the attached log file I sent in the earlier message. You can > see the same file here: https://osf.io/cuxg8/download. > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved > further ahead than previous iterations, however it still failed prior to > completing the job. It appears that all but the six longest scaffolds were > annotated (except for a small few short scaffolds which simply weren't > finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional > nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > >> For Intel MPI, export an environmental variable right before running >> MAKER ?> "export I_MPI_FABRICS=shm:tcp" >> >> Intel MPI has a similar infiniband segfault issue as OpenMPI when running >> Perl scripts, but a different workaround. >> >> ?Carson >> >> >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke >> wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post >> the outcome. We definitely have two of three MPI options you've described >> on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to >> advise my cluster admins to use whichever software you prefer (should there >> be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: >> >>> Try adding these a few options right after ?mpiexec? in your batch >>> script (this will fix infiniband related segfaults as well as some fork >>> related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include >>> ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 >>> --mca mpi_warn_on_fork 0 >>> >>> Also remove the -q in the maker command to get full command lines for >>> subprocesses in the STDERR (allows you to run some commands outside of >>> MAKER to test the source of failures if for example BLASt or Exonerate is >>> causing the segfault). >>> >>> Example ?> >>> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >>> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >>> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base >>> lu -fix_nucleotides >>> >>> >>> One alternate possibility is that OpenMPI is the problem, I?ve seen a >>> few systems where it has an issue with perl itself, and the only way to get >>> around it is to install your own version of perl without perl threads >>> enabled and install MAKER with that version of Perl (then OpenMPI seems to >>> be ok again). If that?s the case it is often easier to switch to MPICH2 or >>> Intel MPI as the MPI launcher if they are available and then reinstall >>> MAKER with that MPI flavor. >>> >>> ?Carson >>> >>> >>> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >>> wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell >>> script submitted to Slurm, the _opts.ctl file, and the pair of log files >>> generated from the job. The reason there are a 1a and 1b pair of files is >>> that I had initially set the number of cpus in the _opts.ctl file to "60", >>> but then tried re-running it after setting it to "28". Both seem to have >>> the same result. >>> I certainly have access to more memory if needed. I'm using a pretty >>> typical (I think?) cluster that controls jobs with Slurm using a Lustre >>> file system - it's the main high performance computing center at our >>> university. I have access to plenty of nodes that contain about 120-150g of >>> RAM each with between 24-28 cpus each, as well a handful of higher memory >>> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >>> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >>> 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or >>> snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >>> >>>> If running under MPI, the reason for a failure may be further back in >>>> the STDERR (failures tend snowball other failures, so the initial cause is >>>> often way back). If you can capture the STDERR and send it, that would be >>>> the most informative. If its memory, you can also set all the blast_depth >>>> parameters in maker_botpts.ctl to a value like 20. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>>> wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it >>>> appears that the option to enter new posts is disabled. Perhaps this is by >>>> design so emails go directly to this address. I hope this is what you are >>>> looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the >>>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>>> In my various tests in running Maker, the vast majority of the smaller >>>> fragments are annotated successfully, but nearly all the large scaffolds >>>> fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was >>>> hoping you might be able to help me determine where I should start >>>> troubleshooting. I thought perhaps it was an error concerning memory >>>> requirements, so I altered the chunk size from the default to a few larger >>>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>>> same outcome). I've tried running the program with parallel support using >>>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>>> and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and >>>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>>> perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the >>>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>>> .specific.ori.out, .specific.cat.gz, .specific.out, >>>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>>> finished from what I can tell). >>>> However, the particular contents in the parent directory to the >>>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>>> generally always look something like this (that is, they stall with the >>>> same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Thu Feb 27 06:26:20 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Thu, 27 Feb 2020 08:26:20 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LUmaker.log.gz Type: application/x-gzip Size: 4808331 bytes Desc: not available URL: From gongyuan.cao at duke.edu Sat Feb 29 10:44:24 2020 From: gongyuan.cao at duke.edu (Gongyuan Cao) Date: Sat, 29 Feb 2020 17:44:24 +0000 Subject: [maker-devel] maker_functional_gff error Message-ID: Hi, I'm running maker_functional_gff and got this error: Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3. I've checked the gff file and there are no missing "ID=" tags, what could be the problem? head of blastpoutput: lacu_11543-RA A4GSN8 49.643 2099 951 36 1 2026 1 2066 0.0 1724 lacu_11544-RA F4IF36 75.473 1268 273 6 33 1263 29 1295 0.0 1949 lacu_11548-RA O81123 51.316 380 144 10 24 401 15 355 2.29e-119 353 lacu_11549-RA Q9SA32 60.767 339 130 3 328 664 58 395 1.54e-141 421 lacu_11547-RA Q9SLK2 72.493 349 96 0 1 349 1 349 0.0 518 lacu_11558-RA Q9LTV6 76.689 296 69 0 5 300 3 298 2.21e-158 446 lacu_11557-RA Q9C9U5 40.441 272 145 6 866 1134 746 1003 7.55e-50 196 lacu_11552-RA Q96GG9 44.715 246 128 3 58 296 2 246 2.30e-73 229 lacu_11560-RA Q42961 89.375 480 47 2 2 480 4 480 0.0 855 lacu_11561-RA Q42962 91.022 401 36 0 1 401 1 401 0.0 731 head of gff: ##gff-version 3 Linkage_group_5 . contig 1 30484050 . . . ID=Linkage_group_5;Name=Linkage_group_5 Linkage_group_5 maker gene 10601 29761 . + . ID=lacu_11543;Name=lacu_11543;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188;score=1168; Linkage_group_5 maker mRNA 10601 29761 6483 + . ID=lacu_11543-RA;Parent=lacu_11543;Name=lacu_11543-RA;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188-mRNA-1;_AED=0.00;_QI=105|1|1|1|1|1|48|246|2043;_eAED=0.00;score=1168; Linkage_group_5 maker exon 10601 11011 . + . ID=lacu_11543-RA:exon:0;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11129 11275 . + . ID=lacu_11543-RA:exon:1;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11403 11501 . + . ID=lacu_11543-RA:exon:2;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11835 11963 . + . ID=lacu_11543-RA:exon:3;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12054 12146 . + . ID=lacu_11543-RA:exon:4;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12240 12305 . + . ID=lacu_11543-RA:exon:5;Parent=lacu_11543-RA; -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:27:47 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:27:47 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: References: Message-ID: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. ?Carson > On Jan 31, 2020, at 2:50 PM, Emily Abernathy wrote: > > Hello, > I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: > > I am using a genome that I assembled in Supernova v2 with headers that resemble this: > >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 > > and I downloaded two fasta files from ENSEMBL whose headers resemble this: > >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > and > > >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! > > Thanks in advance, > E. Abernathy > > > > -- > Emily Abernathy > Graduate Group in Ecology > University of California, Davis > http://hulllabucd.wix.com/hulllab _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:34:10 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:34:10 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> References: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Message-ID: <910B07A7-780E-4A3B-B8E3-5874FDF14087@gmail.com> Also update Bioperl to 1.7.4. ?Carson > On Feb 4, 2020, at 5:27 PM, Carson Holt wrote: > > Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. > > ?Carson > > > >> On Jan 31, 2020, at 2:50 PM, Emily Abernathy > wrote: >> >> Hello, >> I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: >> >> I am using a genome that I assembled in Supernova v2 with headers that resemble this: >> >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 >> >> and I downloaded two fasta files from ENSEMBL whose headers resemble this: >> >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> and >> >> >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! >> >> Thanks in advance, >> E. Abernathy >> >> >> >> -- >> Emily Abernathy >> Graduate Group in Ecology >> University of California, Davis >> http://hulllabucd.wix.com/hulllab _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:38:05 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:38:05 -0700 Subject: [maker-devel] Avoiding re-indexing the same file In-Reply-To: References: Message-ID: <032EA515-1EAC-4374-9B8B-51D6ECC39B27@gmail.com> MAKER only indexes the input files during the first run. It will reuse the indexes after that. The indexes are in the *.maker.output.mpi_blastdb directory. If this is a repeatmasker issue, it keeps it?s indexes under the ?/RepeatMasker/Libraries/ directory and reuses them after indexing the first time. ?Carson > On Jan 29, 2020, at 7:42 AM, H.DENISE wrote: > > Hi, > I?m new to Maker and need to compare the annotations with different features (+/- RepeatMasker, using different protein files etc ?). However the first step seems to be the indexing of my files and the RNASeq file I?m using is large, therefore Maker seems to take ages at this step,. As it is a constant file for my applications, is there a way to provide the indexing file in order to avoid repeating this step? > Thanks in advance, Hubert > > > > Hubert DENISE, PhD > > Genome Data Analyst > R.Durbin's group > Department of Genetics > University of Cambridge > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 9 04:02:27 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 13:02:27 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: files.rar Type: application/octet-stream Size: 5380 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Feb 9 03:24:09 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 12:24:09 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation.gff Type: application/octet-stream Size: 2514 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation_maker_opts.ctl Type: application/octet-stream Size: 5441 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover.gff Type: application/octet-stream Size: 16168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover_maker_opts.ctl Type: application/octet-stream Size: 4643 bytes Desc: not available URL: From mbreitbach at hudsonalpha.org Tue Feb 11 09:12:23 2020 From: mbreitbach at hudsonalpha.org (Megan Breitbach) Date: Tue, 11 Feb 2020 10:12:23 -0600 Subject: [maker-devel] Maker Issue re-annotating Message-ID: Good morning, I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of From devon.orourke at gmail.com Wed Feb 19 13:54:28 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 19 Feb 2020 15:54:28 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail Message-ID: Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: ``` DIED RANK 0:6:0:0 DIED COUNT 2 ``` (the master 'run.log' file just shows "DIED COUNT 2") I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): ``` 0 evidence_0.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 scaffold22.0.final.section scaffold22.0.pred.raw.section scaffold22.0.raw.section scaffold22.gff.ann scaffold22.gff.def scaffold22.gff.seq ``` For the completed scaffold, there are many more files created: ``` 0 10 100 20 30 40 50 60 70 80 90 evidence_0.gff evidence_10.gff evidence_1.gff evidence_2.gff evidence_3.gff evidence_4.gff evidence_5.gff evidence_6.gff evidence_7.gff evidence_8.gff evidence_9.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 run.log.child.1 run.log.child.10 run.log.child.2 run.log.child.3 run.log.child.4 run.log.child.5 run.log.child.6 run.log.child.7 run.log.child.8 run.log.child.9 scaffold4.0-1.raw.section scaffold4.0.final.section scaffold4.0.pred.raw.section scaffold4.0.raw.section scaffold4.10.final.section scaffold4.10.pred.raw.section scaffold4.10.raw.section scaffold4.1-2.raw.section scaffold4.1.final.section scaffold4.1.pred.raw.section scaffold4.1.raw.section scaffold4.2-3.raw.section scaffold4.2.final.section scaffold4.2.pred.raw.section scaffold4.2.raw.section scaffold4.3-4.raw.section scaffold4.3.final.section scaffold4.3.pred.raw.section scaffold4.3.raw.section scaffold4.4-5.raw.section scaffold4.4.final.section scaffold4.4.pred.raw.section scaffold4.4.raw.section scaffold4.5-6.raw.section scaffold4.5.final.section scaffold4.5.pred.raw.section scaffold4.5.raw.section scaffold4.6-7.raw.section scaffold4.6.final.section scaffold4.6.pred.raw.section scaffold4.6.raw.section scaffold4.7-8.raw.section scaffold4.7.final.section scaffold4.7.pred.raw.section scaffold4.7.raw.section scaffold4.8-9.raw.section scaffold4.8.final.section scaffold4.8.pred.raw.section scaffold4.8.raw.section scaffold4.9-10.raw.section scaffold4.9.final.section scaffold4.9.pred.raw.section scaffold4.9.raw.section ``` Thanks for any troubleshooting tips you can offer. Cheers, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From tayab.soomro at canada.ca Thu Feb 20 14:42:24 2020 From: tayab.soomro at canada.ca (Soomro, Tayab (AAFC/AAC)) Date: Thu, 20 Feb 2020 21:42:24 +0000 Subject: [maker-devel] Unassembled RNA-Seq data to Maker Message-ID: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. From jason.stajich at gmail.com Thu Feb 20 14:53:14 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 20 Feb 2020 13:53:14 -0800 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: <0169feea-4c2c-4376-a27f-fab33fa5aa0f@Spark> It uses a transcript alignment approach (blast and exonerate) which are optimized for long est to Genome alignments. You can build transcripts first by running trinity to assemble the RNAseq reads. On Feb 20, 2020, 1:42 PM -0800, Soomro, Tayab (AAFC/AAC) , wrote: > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Feb 20 19:16:10 2020 From: scott at scottcain.net (Scott Cain) Date: Thu, 20 Feb 2020 18:16:10 -0800 Subject: [maker-devel] GMOD in Google Summer of Code Message-ID: Hello, I am very pleased to announce that GMOD in conjunction with Reactome, Galaxy and OICR/WormBase, together forming Open Genome Informatics, has been accepted for the Google Summer of Code. If you or someone you know might be a student interested in participating in GSoC, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2020 where there are proposed projects that cover a fair number of technologies. Official proposals from students will be due in mid March (more on that later). But WAIT! There's more: if you might be interested in being a mentor and working with a student this summer, it's not too late! You can add new project ideas to the page above (contact me if you need an account), or you can even volunteer to add yourself to one of the existing ideas as a potential mentor. Please feel free to forward this to other mailing lists or people who might be interested. We are already an eclectic, dispersed group, so everyone is welcome. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:05:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:05:31 -0700 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: MAKER does not assemble the reads. It uses BLAST to align a sequence and then exonerate to polish around splice sites. This allows identification of introns (exons aren?t as useful for gene prediction hints). Unassembled reads will more likely align spuriously, will not cross splice sites (unless for intron identification), and will not be assigned to the proper strand (intron aware alignments allow proper strand assignment). MAKER was developed when older EST technology was the only option, mRNA-seq can be treated the same if it is assembled first. ?Carson > On Feb 20, 2020, at 2:42 PM, Soomro, Tayab (AAFC/AAC) wrote: > > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 12:09:58 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:09:58 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: Message-ID: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. ?Carson > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. > > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? > > In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). > However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:10:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:10:59 -0700 Subject: [maker-devel] Maker Issue re-annotating In-Reply-To: References: Message-ID: <0546CBA9-9EB4-45B0-BB02-888E2F1B8AA9@gmail.com> Sorry for the slow reply. Please capture and send the STDERR from one of the failures. ?Carson > On Feb 11, 2020, at 9:12 AM, Megan Breitbach wrote: > > Good morning, > > I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of > Here are the parameters used in the maker_opts.ctl file- > > #-----Genome (these are always required) > genome=blackbear_DNAzoo.FINAL.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=blackbear_DNAzoo.FINAL.all.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=Ursus_maritimus.UrsMar_1.0.cdna.all.fa #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Ursus_maritimus.UrsMar_1.0.pep.all.fa #protein sequence file in fasta format (i.e. from mutiple organisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=blackbear.hmm #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > allow_overlap=0 #allowed gene overlap fraction (value from 0 to 1, blank for default) > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > min_intron=20 #minimum intron length (used for alignment polishing) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > Thanks, > -- > Megan Ramaker, PhD > Postdoctoral Trainee > HudsonAlpha Institute for Biotechnology > 601 Genome Way > Huntsville, AL 35806 > 478-284-6723 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:19:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:19:59 -0700 Subject: [maker-devel] Alternative splicing in MAKER In-Reply-To: References: Message-ID: est2genome=1 together with alt_splice=1 can cause weird behavior, because est2genome is just a cut and paste of an alignemnt to being a gene model, it will always be 100% supported by the evidence (itself as an alignment), and anything that overlaps will be clustered to being the same gene which can be messy if models you are moving forward align to multiple locations. You can add est_forward=1 (manually add it, it?s undocumented) to maker_opts.ctl to get MAKER to do a few extra behaviors. It will keep the names from the est2genome alignments (not rename them to maker names), and if you add hints like gene_id= to the fasta header it will only cluster things with the same gene ID and not just cluster by overlap. Also you can add maker_coor= to the header to restrict alignments to specific contigs or even contig regions. ?Carson > On Feb 9, 2020, at 3:24 AM, Lior Glick wrote: > > Hello, > I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. > In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. > > In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. > The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. > In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. > Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). > > Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? > > Thanks a lot! > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:27:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:27:43 -0700 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: Message-ID: Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson > On Dec 18, 2019, at 7:19 AM, Patrick Tran Van wrote: > > Hi Carson, > > I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! > > Scaffold maker > mRNA 12117462 > 12128433 . > - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; > Scaffold maker > exon 12128112 > 12128433 . > - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; > Scaffold maker > exon 12117462 > 12118046 . > - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118141 > 12118301 . > - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118386 > 12118539 . > - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12118818 > 12122493 . > - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12123591 > 12123893 . > - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12123995 > 12124303 . > - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12125119 > 12125418 . > - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12126005 > 12126313 . > - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12127460 > 12127687 . > - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > five_prime_UTR 12128112 > 12128433 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12127460 > 12127687 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12126005 > 12126313 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12125119 > 12125418 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123995 > 12124303 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123591 > 12123893 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12118882 > 12122493 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118818 > 12118881 . > - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118386 > 12118539 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118141 > 12118301 . > - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12117709 > 12118046 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > three_prime_UTR 12117462 > 12117708 . > - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; > > > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:54:32 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:54:32 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). Example ?> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. ?Carson > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. > I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: > If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >> >> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >> >> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Wed Feb 26 12:36:25 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 14:36:25 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: Thanks very much for the reply Carson, I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. Appreciate your insights; hope the weather in UT is filled with sun or snow or both. Devon On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > If running under MPI, the reason for a failure may be further back in the > STDERR (failures tend snowball other failures, so the initial cause is > often way back). If you can capture the STDERR and send it, that would be > the most informative. If its memory, you can also set all the blast_depth > parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears > that the option to enter new posts is disabled. Perhaps this is by design > so emails go directly to this address. I hope this is what you are looking > for. > > Thank you for your continued support of Maker and your responses to the > forum posts. I have been running Maker (V3.01.02-beta) to annotate a > mammalian genome that consists of 22 chromosome-length scaffolds (between > ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. > In my various tests in running Maker, the vast majority of the smaller > fragments are annotated successfully, but nearly all the large scaffolds > fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was > hoping you might be able to help me determine where I should start > troubleshooting. I thought perhaps it was an error concerning memory > requirements, so I altered the chunk size from the default to a few larger > sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the > same outcome). I've tried running the program with parallel support using > either openMPI or mpich. I've tried running on a single node using 24 cpus > and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces > the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but > the other 21 of 22 large scaffolds fail. This makes me think perhaps it's > not a memory issue? > > In the case of both the completed and failed scaffolds, the > "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, > .specific.ori.out, .specific.cat.gz, .specific.out, > te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest > *fasta.tblastx, and protein *fasta.blastx files are all present (and appear > finished from what I can tell). > However, the particular contents in the parent directory to the > "theVoid.scaffold" folder differ. For the failed scaffolds, the contents > generally always look something like this (that is, they stall with the > same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1a.log.gz Type: application/x-gzip Size: 21751 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1b.log.gz Type: application/x-gzip Size: 2175 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_maker_opts.ctl Type: application/octet-stream Size: 3719 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_slurm.sh Type: application/x-sh Size: 787 bytes Desc: not available URL: From devon.orourke at gmail.com Wed Feb 26 13:15:08 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 15:15:08 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: Much appreciated Carson, I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). Thanks, Devon On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > Try adding these a few options right after ?mpiexec? in your batch script > (this will fix infiniband related segfaults as well as some fork related > segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for > subprocesses in the STDERR (allows you to run some commands outside of > MAKER to test the source of failures if for example BLASt or Exonerate is > causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu > -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few > systems where it has an issue with perl itself, and the only way to get > around it is to install your own version of perl without perl threads > enabled and install MAKER with that version of Perl (then OpenMPI seems to > be ok again). If that?s the case it is often easier to switch to MPICH2 or > Intel MPI as the MPI launcher if they are available and then reinstall > MAKER with that MPI flavor. > > ?Carson > > > > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell > script submitted to Slurm, the _opts.ctl file, and the pair of log files > generated from the job. The reason there are a 1a and 1b pair of files is > that I had initially set the number of cpus in the _opts.ctl file to "60", > but then tried re-running it after setting it to "28". Both seem to have > the same result. > I certainly have access to more memory if needed. I'm using a pretty > typical (I think?) cluster that controls jobs with Slurm using a Lustre > file system - it's the main high performance computing center at our > university. I have access to plenty of nodes that contain about 120-150g of > RAM each with between 24-28 cpus each, as well a handful of higher memory > nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a > similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over > 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or > snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > >> If running under MPI, the reason for a failure may be further back in the >> STDERR (failures tend snowball other failures, so the initial cause is >> often way back). If you can capture the STDERR and send it, that would be >> the most informative. If its memory, you can also set all the blast_depth >> parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >> wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears >> that the option to enter new posts is disabled. Perhaps this is by design >> so emails go directly to this address. I hope this is what you are looking >> for. >> >> Thank you for your continued support of Maker and your responses to the >> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >> mammalian genome that consists of 22 chromosome-length scaffolds (between >> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >> In my various tests in running Maker, the vast majority of the smaller >> fragments are annotated successfully, but nearly all the large scaffolds >> fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was >> hoping you might be able to help me determine where I should start >> troubleshooting. I thought perhaps it was an error concerning memory >> requirements, so I altered the chunk size from the default to a few larger >> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >> same outcome). I've tried running the program with parallel support using >> either openMPI or mpich. I've tried running on a single node using 24 cpus >> and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces >> the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but >> the other 21 of 22 large scaffolds fail. This makes me think perhaps it's >> not a memory issue? >> >> In the case of both the completed and failed scaffolds, the >> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >> .specific.ori.out, .specific.cat.gz, .specific.out, >> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >> finished from what I can tell). >> However, the particular contents in the parent directory to the >> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >> generally always look something like this (that is, they stall with the >> same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 13:18:34 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 13:18:34 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. ?Carson > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: > Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. > > ?Carson > > > >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>> >>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Feb 28 05:50:27 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 28 Feb 2020 07:50:27 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download. Thanks! (previous message below) .... Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Sat Feb 29 10:27:16 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Sat, 29 Feb 2020 12:27:16 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi once again Carson, Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. Thanks for your assistance, Devon On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a > notification about the text body size being too large. I thought perhaps it > was related to the attached log file I sent in the earlier message. You can > see the same file here: https://osf.io/cuxg8/download. > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved > further ahead than previous iterations, however it still failed prior to > completing the job. It appears that all but the six longest scaffolds were > annotated (except for a small few short scaffolds which simply weren't > finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional > nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > >> For Intel MPI, export an environmental variable right before running >> MAKER ?> "export I_MPI_FABRICS=shm:tcp" >> >> Intel MPI has a similar infiniband segfault issue as OpenMPI when running >> Perl scripts, but a different workaround. >> >> ?Carson >> >> >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke >> wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post >> the outcome. We definitely have two of three MPI options you've described >> on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to >> advise my cluster admins to use whichever software you prefer (should there >> be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: >> >>> Try adding these a few options right after ?mpiexec? in your batch >>> script (this will fix infiniband related segfaults as well as some fork >>> related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include >>> ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 >>> --mca mpi_warn_on_fork 0 >>> >>> Also remove the -q in the maker command to get full command lines for >>> subprocesses in the STDERR (allows you to run some commands outside of >>> MAKER to test the source of failures if for example BLASt or Exonerate is >>> causing the segfault). >>> >>> Example ?> >>> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >>> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >>> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base >>> lu -fix_nucleotides >>> >>> >>> One alternate possibility is that OpenMPI is the problem, I?ve seen a >>> few systems where it has an issue with perl itself, and the only way to get >>> around it is to install your own version of perl without perl threads >>> enabled and install MAKER with that version of Perl (then OpenMPI seems to >>> be ok again). If that?s the case it is often easier to switch to MPICH2 or >>> Intel MPI as the MPI launcher if they are available and then reinstall >>> MAKER with that MPI flavor. >>> >>> ?Carson >>> >>> >>> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >>> wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell >>> script submitted to Slurm, the _opts.ctl file, and the pair of log files >>> generated from the job. The reason there are a 1a and 1b pair of files is >>> that I had initially set the number of cpus in the _opts.ctl file to "60", >>> but then tried re-running it after setting it to "28". Both seem to have >>> the same result. >>> I certainly have access to more memory if needed. I'm using a pretty >>> typical (I think?) cluster that controls jobs with Slurm using a Lustre >>> file system - it's the main high performance computing center at our >>> university. I have access to plenty of nodes that contain about 120-150g of >>> RAM each with between 24-28 cpus each, as well a handful of higher memory >>> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >>> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >>> 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or >>> snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >>> >>>> If running under MPI, the reason for a failure may be further back in >>>> the STDERR (failures tend snowball other failures, so the initial cause is >>>> often way back). If you can capture the STDERR and send it, that would be >>>> the most informative. If its memory, you can also set all the blast_depth >>>> parameters in maker_botpts.ctl to a value like 20. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>>> wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it >>>> appears that the option to enter new posts is disabled. Perhaps this is by >>>> design so emails go directly to this address. I hope this is what you are >>>> looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the >>>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>>> In my various tests in running Maker, the vast majority of the smaller >>>> fragments are annotated successfully, but nearly all the large scaffolds >>>> fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was >>>> hoping you might be able to help me determine where I should start >>>> troubleshooting. I thought perhaps it was an error concerning memory >>>> requirements, so I altered the chunk size from the default to a few larger >>>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>>> same outcome). I've tried running the program with parallel support using >>>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>>> and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and >>>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>>> perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the >>>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>>> .specific.ori.out, .specific.cat.gz, .specific.out, >>>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>>> finished from what I can tell). >>>> However, the particular contents in the parent directory to the >>>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>>> generally always look something like this (that is, they stall with the >>>> same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Thu Feb 27 06:26:20 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Thu, 27 Feb 2020 08:26:20 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LUmaker.log.gz Type: application/x-gzip Size: 4808331 bytes Desc: not available URL: From gongyuan.cao at duke.edu Sat Feb 29 10:44:24 2020 From: gongyuan.cao at duke.edu (Gongyuan Cao) Date: Sat, 29 Feb 2020 17:44:24 +0000 Subject: [maker-devel] maker_functional_gff error Message-ID: Hi, I'm running maker_functional_gff and got this error: Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3. I've checked the gff file and there are no missing "ID=" tags, what could be the problem? head of blastpoutput: lacu_11543-RA A4GSN8 49.643 2099 951 36 1 2026 1 2066 0.0 1724 lacu_11544-RA F4IF36 75.473 1268 273 6 33 1263 29 1295 0.0 1949 lacu_11548-RA O81123 51.316 380 144 10 24 401 15 355 2.29e-119 353 lacu_11549-RA Q9SA32 60.767 339 130 3 328 664 58 395 1.54e-141 421 lacu_11547-RA Q9SLK2 72.493 349 96 0 1 349 1 349 0.0 518 lacu_11558-RA Q9LTV6 76.689 296 69 0 5 300 3 298 2.21e-158 446 lacu_11557-RA Q9C9U5 40.441 272 145 6 866 1134 746 1003 7.55e-50 196 lacu_11552-RA Q96GG9 44.715 246 128 3 58 296 2 246 2.30e-73 229 lacu_11560-RA Q42961 89.375 480 47 2 2 480 4 480 0.0 855 lacu_11561-RA Q42962 91.022 401 36 0 1 401 1 401 0.0 731 head of gff: ##gff-version 3 Linkage_group_5 . contig 1 30484050 . . . ID=Linkage_group_5;Name=Linkage_group_5 Linkage_group_5 maker gene 10601 29761 . + . ID=lacu_11543;Name=lacu_11543;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188;score=1168; Linkage_group_5 maker mRNA 10601 29761 6483 + . ID=lacu_11543-RA;Parent=lacu_11543;Name=lacu_11543-RA;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188-mRNA-1;_AED=0.00;_QI=105|1|1|1|1|1|48|246|2043;_eAED=0.00;score=1168; Linkage_group_5 maker exon 10601 11011 . + . ID=lacu_11543-RA:exon:0;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11129 11275 . + . ID=lacu_11543-RA:exon:1;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11403 11501 . + . ID=lacu_11543-RA:exon:2;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11835 11963 . + . ID=lacu_11543-RA:exon:3;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12054 12146 . + . ID=lacu_11543-RA:exon:4;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12240 12305 . + . ID=lacu_11543-RA:exon:5;Parent=lacu_11543-RA; -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:27:47 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:27:47 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: References: Message-ID: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. ?Carson > On Jan 31, 2020, at 2:50 PM, Emily Abernathy wrote: > > Hello, > I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: > > I am using a genome that I assembled in Supernova v2 with headers that resemble this: > >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 > > and I downloaded two fasta files from ENSEMBL whose headers resemble this: > >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > and > > >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! > > Thanks in advance, > E. Abernathy > > > > -- > Emily Abernathy > Graduate Group in Ecology > University of California, Davis > http://hulllabucd.wix.com/hulllab _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:34:10 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:34:10 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> References: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Message-ID: <910B07A7-780E-4A3B-B8E3-5874FDF14087@gmail.com> Also update Bioperl to 1.7.4. ?Carson > On Feb 4, 2020, at 5:27 PM, Carson Holt wrote: > > Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. > > ?Carson > > > >> On Jan 31, 2020, at 2:50 PM, Emily Abernathy > wrote: >> >> Hello, >> I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: >> >> I am using a genome that I assembled in Supernova v2 with headers that resemble this: >> >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 >> >> and I downloaded two fasta files from ENSEMBL whose headers resemble this: >> >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> and >> >> >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! >> >> Thanks in advance, >> E. Abernathy >> >> >> >> -- >> Emily Abernathy >> Graduate Group in Ecology >> University of California, Davis >> http://hulllabucd.wix.com/hulllab _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:38:05 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:38:05 -0700 Subject: [maker-devel] Avoiding re-indexing the same file In-Reply-To: References: Message-ID: <032EA515-1EAC-4374-9B8B-51D6ECC39B27@gmail.com> MAKER only indexes the input files during the first run. It will reuse the indexes after that. The indexes are in the *.maker.output.mpi_blastdb directory. If this is a repeatmasker issue, it keeps it?s indexes under the ?/RepeatMasker/Libraries/ directory and reuses them after indexing the first time. ?Carson > On Jan 29, 2020, at 7:42 AM, H.DENISE wrote: > > Hi, > I?m new to Maker and need to compare the annotations with different features (+/- RepeatMasker, using different protein files etc ?). However the first step seems to be the indexing of my files and the RNASeq file I?m using is large, therefore Maker seems to take ages at this step,. As it is a constant file for my applications, is there a way to provide the indexing file in order to avoid repeating this step? > Thanks in advance, Hubert > > > > Hubert DENISE, PhD > > Genome Data Analyst > R.Durbin's group > Department of Genetics > University of Cambridge > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 9 04:02:27 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 13:02:27 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: files.rar Type: application/octet-stream Size: 5380 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Feb 9 03:24:09 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 12:24:09 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation.gff Type: application/octet-stream Size: 2515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation_maker_opts.ctl Type: application/octet-stream Size: 5442 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover.gff Type: application/octet-stream Size: 16169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover_maker_opts.ctl Type: application/octet-stream Size: 4644 bytes Desc: not available URL: From mbreitbach at hudsonalpha.org Tue Feb 11 09:12:23 2020 From: mbreitbach at hudsonalpha.org (Megan Breitbach) Date: Tue, 11 Feb 2020 10:12:23 -0600 Subject: [maker-devel] Maker Issue re-annotating Message-ID: Good morning, I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of From devon.orourke at gmail.com Wed Feb 19 13:54:28 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 19 Feb 2020 15:54:28 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail Message-ID: Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: ``` DIED RANK 0:6:0:0 DIED COUNT 2 ``` (the master 'run.log' file just shows "DIED COUNT 2") I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): ``` 0 evidence_0.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 scaffold22.0.final.section scaffold22.0.pred.raw.section scaffold22.0.raw.section scaffold22.gff.ann scaffold22.gff.def scaffold22.gff.seq ``` For the completed scaffold, there are many more files created: ``` 0 10 100 20 30 40 50 60 70 80 90 evidence_0.gff evidence_10.gff evidence_1.gff evidence_2.gff evidence_3.gff evidence_4.gff evidence_5.gff evidence_6.gff evidence_7.gff evidence_8.gff evidence_9.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 run.log.child.1 run.log.child.10 run.log.child.2 run.log.child.3 run.log.child.4 run.log.child.5 run.log.child.6 run.log.child.7 run.log.child.8 run.log.child.9 scaffold4.0-1.raw.section scaffold4.0.final.section scaffold4.0.pred.raw.section scaffold4.0.raw.section scaffold4.10.final.section scaffold4.10.pred.raw.section scaffold4.10.raw.section scaffold4.1-2.raw.section scaffold4.1.final.section scaffold4.1.pred.raw.section scaffold4.1.raw.section scaffold4.2-3.raw.section scaffold4.2.final.section scaffold4.2.pred.raw.section scaffold4.2.raw.section scaffold4.3-4.raw.section scaffold4.3.final.section scaffold4.3.pred.raw.section scaffold4.3.raw.section scaffold4.4-5.raw.section scaffold4.4.final.section scaffold4.4.pred.raw.section scaffold4.4.raw.section scaffold4.5-6.raw.section scaffold4.5.final.section scaffold4.5.pred.raw.section scaffold4.5.raw.section scaffold4.6-7.raw.section scaffold4.6.final.section scaffold4.6.pred.raw.section scaffold4.6.raw.section scaffold4.7-8.raw.section scaffold4.7.final.section scaffold4.7.pred.raw.section scaffold4.7.raw.section scaffold4.8-9.raw.section scaffold4.8.final.section scaffold4.8.pred.raw.section scaffold4.8.raw.section scaffold4.9-10.raw.section scaffold4.9.final.section scaffold4.9.pred.raw.section scaffold4.9.raw.section ``` Thanks for any troubleshooting tips you can offer. Cheers, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From tayab.soomro at canada.ca Thu Feb 20 14:42:24 2020 From: tayab.soomro at canada.ca (Soomro, Tayab (AAFC/AAC)) Date: Thu, 20 Feb 2020 21:42:24 +0000 Subject: [maker-devel] Unassembled RNA-Seq data to Maker Message-ID: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. From jason.stajich at gmail.com Thu Feb 20 14:53:14 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 20 Feb 2020 13:53:14 -0800 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: <0169feea-4c2c-4376-a27f-fab33fa5aa0f@Spark> It uses a transcript alignment approach (blast and exonerate) which are optimized for long est to Genome alignments. You can build transcripts first by running trinity to assemble the RNAseq reads. On Feb 20, 2020, 1:42 PM -0800, Soomro, Tayab (AAFC/AAC) , wrote: > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Feb 20 19:16:10 2020 From: scott at scottcain.net (Scott Cain) Date: Thu, 20 Feb 2020 18:16:10 -0800 Subject: [maker-devel] GMOD in Google Summer of Code Message-ID: Hello, I am very pleased to announce that GMOD in conjunction with Reactome, Galaxy and OICR/WormBase, together forming Open Genome Informatics, has been accepted for the Google Summer of Code. If you or someone you know might be a student interested in participating in GSoC, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2020 where there are proposed projects that cover a fair number of technologies. Official proposals from students will be due in mid March (more on that later). But WAIT! There's more: if you might be interested in being a mentor and working with a student this summer, it's not too late! You can add new project ideas to the page above (contact me if you need an account), or you can even volunteer to add yourself to one of the existing ideas as a potential mentor. Please feel free to forward this to other mailing lists or people who might be interested. We are already an eclectic, dispersed group, so everyone is welcome. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:05:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:05:31 -0700 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: MAKER does not assemble the reads. It uses BLAST to align a sequence and then exonerate to polish around splice sites. This allows identification of introns (exons aren?t as useful for gene prediction hints). Unassembled reads will more likely align spuriously, will not cross splice sites (unless for intron identification), and will not be assigned to the proper strand (intron aware alignments allow proper strand assignment). MAKER was developed when older EST technology was the only option, mRNA-seq can be treated the same if it is assembled first. ?Carson > On Feb 20, 2020, at 2:42 PM, Soomro, Tayab (AAFC/AAC) wrote: > > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 12:09:58 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:09:58 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: Message-ID: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. ?Carson > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. > > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? > > In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). > However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:10:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:10:59 -0700 Subject: [maker-devel] Maker Issue re-annotating In-Reply-To: References: Message-ID: <0546CBA9-9EB4-45B0-BB02-888E2F1B8AA9@gmail.com> Sorry for the slow reply. Please capture and send the STDERR from one of the failures. ?Carson > On Feb 11, 2020, at 9:12 AM, Megan Breitbach wrote: > > Good morning, > > I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of > Here are the parameters used in the maker_opts.ctl file- > > #-----Genome (these are always required) > genome=blackbear_DNAzoo.FINAL.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=blackbear_DNAzoo.FINAL.all.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=Ursus_maritimus.UrsMar_1.0.cdna.all.fa #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Ursus_maritimus.UrsMar_1.0.pep.all.fa #protein sequence file in fasta format (i.e. from mutiple organisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=blackbear.hmm #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > allow_overlap=0 #allowed gene overlap fraction (value from 0 to 1, blank for default) > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > min_intron=20 #minimum intron length (used for alignment polishing) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > Thanks, > -- > Megan Ramaker, PhD > Postdoctoral Trainee > HudsonAlpha Institute for Biotechnology > 601 Genome Way > Huntsville, AL 35806 > 478-284-6723 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:19:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:19:59 -0700 Subject: [maker-devel] Alternative splicing in MAKER In-Reply-To: References: Message-ID: est2genome=1 together with alt_splice=1 can cause weird behavior, because est2genome is just a cut and paste of an alignemnt to being a gene model, it will always be 100% supported by the evidence (itself as an alignment), and anything that overlaps will be clustered to being the same gene which can be messy if models you are moving forward align to multiple locations. You can add est_forward=1 (manually add it, it?s undocumented) to maker_opts.ctl to get MAKER to do a few extra behaviors. It will keep the names from the est2genome alignments (not rename them to maker names), and if you add hints like gene_id= to the fasta header it will only cluster things with the same gene ID and not just cluster by overlap. Also you can add maker_coor= to the header to restrict alignments to specific contigs or even contig regions. ?Carson > On Feb 9, 2020, at 3:24 AM, Lior Glick wrote: > > Hello, > I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. > In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. > > In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. > The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. > In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. > Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). > > Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? > > Thanks a lot! > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:27:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:27:43 -0700 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: Message-ID: Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson > On Dec 18, 2019, at 7:19 AM, Patrick Tran Van wrote: > > Hi Carson, > > I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! > > Scaffold maker > mRNA 12117462 > 12128433 . > - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; > Scaffold maker > exon 12128112 > 12128433 . > - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; > Scaffold maker > exon 12117462 > 12118046 . > - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118141 > 12118301 . > - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118386 > 12118539 . > - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12118818 > 12122493 . > - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12123591 > 12123893 . > - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12123995 > 12124303 . > - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12125119 > 12125418 . > - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12126005 > 12126313 . > - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12127460 > 12127687 . > - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > five_prime_UTR 12128112 > 12128433 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12127460 > 12127687 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12126005 > 12126313 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12125119 > 12125418 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123995 > 12124303 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123591 > 12123893 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12118882 > 12122493 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118818 > 12118881 . > - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118386 > 12118539 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118141 > 12118301 . > - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12117709 > 12118046 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > three_prime_UTR 12117462 > 12117708 . > - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; > > > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:54:32 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:54:32 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). Example ?> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. ?Carson > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. > I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: > If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >> >> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >> >> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Wed Feb 26 12:36:25 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 14:36:25 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: Thanks very much for the reply Carson, I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. Appreciate your insights; hope the weather in UT is filled with sun or snow or both. Devon On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > If running under MPI, the reason for a failure may be further back in the > STDERR (failures tend snowball other failures, so the initial cause is > often way back). If you can capture the STDERR and send it, that would be > the most informative. If its memory, you can also set all the blast_depth > parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears > that the option to enter new posts is disabled. Perhaps this is by design > so emails go directly to this address. I hope this is what you are looking > for. > > Thank you for your continued support of Maker and your responses to the > forum posts. I have been running Maker (V3.01.02-beta) to annotate a > mammalian genome that consists of 22 chromosome-length scaffolds (between > ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. > In my various tests in running Maker, the vast majority of the smaller > fragments are annotated successfully, but nearly all the large scaffolds > fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was > hoping you might be able to help me determine where I should start > troubleshooting. I thought perhaps it was an error concerning memory > requirements, so I altered the chunk size from the default to a few larger > sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the > same outcome). I've tried running the program with parallel support using > either openMPI or mpich. I've tried running on a single node using 24 cpus > and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces > the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but > the other 21 of 22 large scaffolds fail. This makes me think perhaps it's > not a memory issue? > > In the case of both the completed and failed scaffolds, the > "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, > .specific.ori.out, .specific.cat.gz, .specific.out, > te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest > *fasta.tblastx, and protein *fasta.blastx files are all present (and appear > finished from what I can tell). > However, the particular contents in the parent directory to the > "theVoid.scaffold" folder differ. For the failed scaffolds, the contents > generally always look something like this (that is, they stall with the > same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1a.log.gz Type: application/x-gzip Size: 21751 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1b.log.gz Type: application/x-gzip Size: 2175 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_maker_opts.ctl Type: application/octet-stream Size: 3720 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_slurm.sh Type: application/x-sh Size: 788 bytes Desc: not available URL: From devon.orourke at gmail.com Wed Feb 26 13:15:08 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 15:15:08 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: Much appreciated Carson, I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). Thanks, Devon On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > Try adding these a few options right after ?mpiexec? in your batch script > (this will fix infiniband related segfaults as well as some fork related > segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for > subprocesses in the STDERR (allows you to run some commands outside of > MAKER to test the source of failures if for example BLASt or Exonerate is > causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu > -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few > systems where it has an issue with perl itself, and the only way to get > around it is to install your own version of perl without perl threads > enabled and install MAKER with that version of Perl (then OpenMPI seems to > be ok again). If that?s the case it is often easier to switch to MPICH2 or > Intel MPI as the MPI launcher if they are available and then reinstall > MAKER with that MPI flavor. > > ?Carson > > > > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell > script submitted to Slurm, the _opts.ctl file, and the pair of log files > generated from the job. The reason there are a 1a and 1b pair of files is > that I had initially set the number of cpus in the _opts.ctl file to "60", > but then tried re-running it after setting it to "28". Both seem to have > the same result. > I certainly have access to more memory if needed. I'm using a pretty > typical (I think?) cluster that controls jobs with Slurm using a Lustre > file system - it's the main high performance computing center at our > university. I have access to plenty of nodes that contain about 120-150g of > RAM each with between 24-28 cpus each, as well a handful of higher memory > nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a > similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over > 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or > snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > >> If running under MPI, the reason for a failure may be further back in the >> STDERR (failures tend snowball other failures, so the initial cause is >> often way back). If you can capture the STDERR and send it, that would be >> the most informative. If its memory, you can also set all the blast_depth >> parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >> wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears >> that the option to enter new posts is disabled. Perhaps this is by design >> so emails go directly to this address. I hope this is what you are looking >> for. >> >> Thank you for your continued support of Maker and your responses to the >> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >> mammalian genome that consists of 22 chromosome-length scaffolds (between >> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >> In my various tests in running Maker, the vast majority of the smaller >> fragments are annotated successfully, but nearly all the large scaffolds >> fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was >> hoping you might be able to help me determine where I should start >> troubleshooting. I thought perhaps it was an error concerning memory >> requirements, so I altered the chunk size from the default to a few larger >> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >> same outcome). I've tried running the program with parallel support using >> either openMPI or mpich. I've tried running on a single node using 24 cpus >> and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces >> the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but >> the other 21 of 22 large scaffolds fail. This makes me think perhaps it's >> not a memory issue? >> >> In the case of both the completed and failed scaffolds, the >> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >> .specific.ori.out, .specific.cat.gz, .specific.out, >> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >> finished from what I can tell). >> However, the particular contents in the parent directory to the >> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >> generally always look something like this (that is, they stall with the >> same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 13:18:34 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 13:18:34 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. ?Carson > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: > Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. > > ?Carson > > > >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>> >>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Feb 28 05:50:27 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 28 Feb 2020 07:50:27 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download. Thanks! (previous message below) .... Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Sat Feb 29 10:27:16 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Sat, 29 Feb 2020 12:27:16 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi once again Carson, Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. Thanks for your assistance, Devon On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a > notification about the text body size being too large. I thought perhaps it > was related to the attached log file I sent in the earlier message. You can > see the same file here: https://osf.io/cuxg8/download. > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved > further ahead than previous iterations, however it still failed prior to > completing the job. It appears that all but the six longest scaffolds were > annotated (except for a small few short scaffolds which simply weren't > finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional > nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > >> For Intel MPI, export an environmental variable right before running >> MAKER ?> "export I_MPI_FABRICS=shm:tcp" >> >> Intel MPI has a similar infiniband segfault issue as OpenMPI when running >> Perl scripts, but a different workaround. >> >> ?Carson >> >> >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke >> wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post >> the outcome. We definitely have two of three MPI options you've described >> on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to >> advise my cluster admins to use whichever software you prefer (should there >> be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: >> >>> Try adding these a few options right after ?mpiexec? in your batch >>> script (this will fix infiniband related segfaults as well as some fork >>> related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include >>> ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 >>> --mca mpi_warn_on_fork 0 >>> >>> Also remove the -q in the maker command to get full command lines for >>> subprocesses in the STDERR (allows you to run some commands outside of >>> MAKER to test the source of failures if for example BLASt or Exonerate is >>> causing the segfault). >>> >>> Example ?> >>> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >>> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >>> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base >>> lu -fix_nucleotides >>> >>> >>> One alternate possibility is that OpenMPI is the problem, I?ve seen a >>> few systems where it has an issue with perl itself, and the only way to get >>> around it is to install your own version of perl without perl threads >>> enabled and install MAKER with that version of Perl (then OpenMPI seems to >>> be ok again). If that?s the case it is often easier to switch to MPICH2 or >>> Intel MPI as the MPI launcher if they are available and then reinstall >>> MAKER with that MPI flavor. >>> >>> ?Carson >>> >>> >>> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >>> wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell >>> script submitted to Slurm, the _opts.ctl file, and the pair of log files >>> generated from the job. The reason there are a 1a and 1b pair of files is >>> that I had initially set the number of cpus in the _opts.ctl file to "60", >>> but then tried re-running it after setting it to "28". Both seem to have >>> the same result. >>> I certainly have access to more memory if needed. I'm using a pretty >>> typical (I think?) cluster that controls jobs with Slurm using a Lustre >>> file system - it's the main high performance computing center at our >>> university. I have access to plenty of nodes that contain about 120-150g of >>> RAM each with between 24-28 cpus each, as well a handful of higher memory >>> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >>> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >>> 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or >>> snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >>> >>>> If running under MPI, the reason for a failure may be further back in >>>> the STDERR (failures tend snowball other failures, so the initial cause is >>>> often way back). If you can capture the STDERR and send it, that would be >>>> the most informative. If its memory, you can also set all the blast_depth >>>> parameters in maker_botpts.ctl to a value like 20. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>>> wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it >>>> appears that the option to enter new posts is disabled. Perhaps this is by >>>> design so emails go directly to this address. I hope this is what you are >>>> looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the >>>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>>> In my various tests in running Maker, the vast majority of the smaller >>>> fragments are annotated successfully, but nearly all the large scaffolds >>>> fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was >>>> hoping you might be able to help me determine where I should start >>>> troubleshooting. I thought perhaps it was an error concerning memory >>>> requirements, so I altered the chunk size from the default to a few larger >>>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>>> same outcome). I've tried running the program with parallel support using >>>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>>> and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and >>>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>>> perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the >>>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>>> .specific.ori.out, .specific.cat.gz, .specific.out, >>>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>>> finished from what I can tell). >>>> However, the particular contents in the parent directory to the >>>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>>> generally always look something like this (that is, they stall with the >>>> same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Thu Feb 27 06:26:20 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Thu, 27 Feb 2020 08:26:20 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LUmaker.log.gz Type: application/x-gzip Size: 4808331 bytes Desc: not available URL: From gongyuan.cao at duke.edu Sat Feb 29 10:44:24 2020 From: gongyuan.cao at duke.edu (Gongyuan Cao) Date: Sat, 29 Feb 2020 17:44:24 +0000 Subject: [maker-devel] maker_functional_gff error Message-ID: Hi, I'm running maker_functional_gff and got this error: Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3. I've checked the gff file and there are no missing "ID=" tags, what could be the problem? head of blastpoutput: lacu_11543-RA A4GSN8 49.643 2099 951 36 1 2026 1 2066 0.0 1724 lacu_11544-RA F4IF36 75.473 1268 273 6 33 1263 29 1295 0.0 1949 lacu_11548-RA O81123 51.316 380 144 10 24 401 15 355 2.29e-119 353 lacu_11549-RA Q9SA32 60.767 339 130 3 328 664 58 395 1.54e-141 421 lacu_11547-RA Q9SLK2 72.493 349 96 0 1 349 1 349 0.0 518 lacu_11558-RA Q9LTV6 76.689 296 69 0 5 300 3 298 2.21e-158 446 lacu_11557-RA Q9C9U5 40.441 272 145 6 866 1134 746 1003 7.55e-50 196 lacu_11552-RA Q96GG9 44.715 246 128 3 58 296 2 246 2.30e-73 229 lacu_11560-RA Q42961 89.375 480 47 2 2 480 4 480 0.0 855 lacu_11561-RA Q42962 91.022 401 36 0 1 401 1 401 0.0 731 head of gff: ##gff-version 3 Linkage_group_5 . contig 1 30484050 . . . ID=Linkage_group_5;Name=Linkage_group_5 Linkage_group_5 maker gene 10601 29761 . + . ID=lacu_11543;Name=lacu_11543;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188;score=1168; Linkage_group_5 maker mRNA 10601 29761 6483 + . ID=lacu_11543-RA;Parent=lacu_11543;Name=lacu_11543-RA;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188-mRNA-1;_AED=0.00;_QI=105|1|1|1|1|1|48|246|2043;_eAED=0.00;score=1168; Linkage_group_5 maker exon 10601 11011 . + . ID=lacu_11543-RA:exon:0;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11129 11275 . + . ID=lacu_11543-RA:exon:1;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11403 11501 . + . ID=lacu_11543-RA:exon:2;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11835 11963 . + . ID=lacu_11543-RA:exon:3;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12054 12146 . + . ID=lacu_11543-RA:exon:4;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12240 12305 . + . ID=lacu_11543-RA:exon:5;Parent=lacu_11543-RA; -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:27:47 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:27:47 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: References: Message-ID: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. ?Carson > On Jan 31, 2020, at 2:50 PM, Emily Abernathy wrote: > > Hello, > I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: > > I am using a genome that I assembled in Supernova v2 with headers that resemble this: > >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 > > and I downloaded two fasta files from ENSEMBL whose headers resemble this: > >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > and > > >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding > > These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! > > Thanks in advance, > E. Abernathy > > > > -- > Emily Abernathy > Graduate Group in Ecology > University of California, Davis > http://hulllabucd.wix.com/hulllab _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:34:10 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:34:10 -0700 Subject: [maker-devel] Error: FASTA header doesn't match '>(\S+)' In-Reply-To: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> References: <92C88A06-5CD3-4312-BCFC-727FB769BE7E@gmail.com> Message-ID: <910B07A7-780E-4A3B-B8E3-5874FDF14087@gmail.com> Also update Bioperl to 1.7.4. ?Carson > On Feb 4, 2020, at 5:27 PM, Carson Holt wrote: > > Make sure your fast file is not compressed (i.e. .gz or .bz extension). Otherwise one of the entries in the middle of the file likely has nonsense characters. Also you can delete the mpi_blastdb under the *.maker.output directory to force it top rebuild any indexes. > > ?Carson > > > >> On Jan 31, 2020, at 2:50 PM, Emily Abernathy > wrote: >> >> Hello, >> I am running MAKER for the first time and I have been unable to resolve an error. The error is as follows: >> >> I am using a genome that I assembled in Supernova v2 with headers that resemble this: >> >1 edges=1057764..867844 left=488686 right=145511 ver=1.10 style=3 >> >> and I downloaded two fasta files from ENSEMBL whose headers resemble this: >> >ENSTGUT00000018018.1 cdna chromosome:taeGut3.2.4:8_random:2849599:2959678:-1 gene:ENSTGUG00000017338.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> and >> >> >ENSTGUP00000017615.1 pep chromosome:taeGut3.2.4:23_random:205321:209117:1 gene:ENSTGUG00000017337.1 transcript:ENSTGUT00000018017.1 gene_biotype:protein_coding transcript_biotype:protein_coding >> >> These are my only input FASTA files and I have been struggling to fix this error for almost a month now. Any and all advice on how to fix this error is much appreciated! >> >> Thanks in advance, >> E. Abernathy >> >> >> >> -- >> Emily Abernathy >> Graduate Group in Ecology >> University of California, Davis >> http://hulllabucd.wix.com/hulllab _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Feb 4 17:38:05 2020 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 4 Feb 2020 17:38:05 -0700 Subject: [maker-devel] Avoiding re-indexing the same file In-Reply-To: References: Message-ID: <032EA515-1EAC-4374-9B8B-51D6ECC39B27@gmail.com> MAKER only indexes the input files during the first run. It will reuse the indexes after that. The indexes are in the *.maker.output.mpi_blastdb directory. If this is a repeatmasker issue, it keeps it?s indexes under the ?/RepeatMasker/Libraries/ directory and reuses them after indexing the first time. ?Carson > On Jan 29, 2020, at 7:42 AM, H.DENISE wrote: > > Hi, > I?m new to Maker and need to compare the annotations with different features (+/- RepeatMasker, using different protein files etc ?). However the first step seems to be the indexing of my files and the RNASeq file I?m using is large, therefore Maker seems to take ages at this step,. As it is a constant file for my applications, is there a way to provide the indexing file in order to avoid repeating this step? > Thanks in advance, Hubert > > > > Hubert DENISE, PhD > > Genome Data Analyst > R.Durbin's group > Department of Genetics > University of Cambridge > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From liorglic at mail.tau.ac.il Sun Feb 9 04:02:27 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 13:02:27 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: files.rar Type: application/octet-stream Size: 5380 bytes Desc: not available URL: From liorglic at mail.tau.ac.il Sun Feb 9 03:24:09 2020 From: liorglic at mail.tau.ac.il (Lior Glick) Date: Sun, 9 Feb 2020 12:24:09 +0200 Subject: [maker-devel] Alternative splicing in MAKER Message-ID: Hello, I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? Thanks a lot! Lior -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation.gff Type: application/octet-stream Size: 2515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: annotation_maker_opts.ctl Type: application/octet-stream Size: 5442 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover.gff Type: application/octet-stream Size: 16169 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: liftover_maker_opts.ctl Type: application/octet-stream Size: 4644 bytes Desc: not available URL: From mbreitbach at hudsonalpha.org Tue Feb 11 09:12:23 2020 From: mbreitbach at hudsonalpha.org (Megan Breitbach) Date: Tue, 11 Feb 2020 10:12:23 -0600 Subject: [maker-devel] Maker Issue re-annotating Message-ID: Good morning, I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of From devon.orourke at gmail.com Wed Feb 19 13:54:28 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 19 Feb 2020 15:54:28 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail Message-ID: Hello, I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: ``` DIED RANK 0:6:0:0 DIED COUNT 2 ``` (the master 'run.log' file just shows "DIED COUNT 2") I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): ``` 0 evidence_0.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 scaffold22.0.final.section scaffold22.0.pred.raw.section scaffold22.0.raw.section scaffold22.gff.ann scaffold22.gff.def scaffold22.gff.seq ``` For the completed scaffold, there are many more files created: ``` 0 10 100 20 30 40 50 60 70 80 90 evidence_0.gff evidence_10.gff evidence_1.gff evidence_2.gff evidence_3.gff evidence_4.gff evidence_5.gff evidence_6.gff evidence_7.gff evidence_8.gff evidence_9.gff query.fasta query.masked.fasta query.masked.fasta.index query.masked.gff run.log.child.0 run.log.child.1 run.log.child.10 run.log.child.2 run.log.child.3 run.log.child.4 run.log.child.5 run.log.child.6 run.log.child.7 run.log.child.8 run.log.child.9 scaffold4.0-1.raw.section scaffold4.0.final.section scaffold4.0.pred.raw.section scaffold4.0.raw.section scaffold4.10.final.section scaffold4.10.pred.raw.section scaffold4.10.raw.section scaffold4.1-2.raw.section scaffold4.1.final.section scaffold4.1.pred.raw.section scaffold4.1.raw.section scaffold4.2-3.raw.section scaffold4.2.final.section scaffold4.2.pred.raw.section scaffold4.2.raw.section scaffold4.3-4.raw.section scaffold4.3.final.section scaffold4.3.pred.raw.section scaffold4.3.raw.section scaffold4.4-5.raw.section scaffold4.4.final.section scaffold4.4.pred.raw.section scaffold4.4.raw.section scaffold4.5-6.raw.section scaffold4.5.final.section scaffold4.5.pred.raw.section scaffold4.5.raw.section scaffold4.6-7.raw.section scaffold4.6.final.section scaffold4.6.pred.raw.section scaffold4.6.raw.section scaffold4.7-8.raw.section scaffold4.7.final.section scaffold4.7.pred.raw.section scaffold4.7.raw.section scaffold4.8-9.raw.section scaffold4.8.final.section scaffold4.8.pred.raw.section scaffold4.8.raw.section scaffold4.9-10.raw.section scaffold4.9.final.section scaffold4.9.pred.raw.section scaffold4.9.raw.section ``` Thanks for any troubleshooting tips you can offer. Cheers, Devon -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From tayab.soomro at canada.ca Thu Feb 20 14:42:24 2020 From: tayab.soomro at canada.ca (Soomro, Tayab (AAFC/AAC)) Date: Thu, 20 Feb 2020 21:42:24 +0000 Subject: [maker-devel] Unassembled RNA-Seq data to Maker Message-ID: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. From jason.stajich at gmail.com Thu Feb 20 14:53:14 2020 From: jason.stajich at gmail.com (Jason Stajich) Date: Thu, 20 Feb 2020 13:53:14 -0800 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: <0169feea-4c2c-4376-a27f-fab33fa5aa0f@Spark> It uses a transcript alignment approach (blast and exonerate) which are optimized for long est to Genome alignments. You can build transcripts first by running trinity to assemble the RNAseq reads. On Feb 20, 2020, 1:42 PM -0800, Soomro, Tayab (AAFC/AAC) , wrote: > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Thu Feb 20 19:16:10 2020 From: scott at scottcain.net (Scott Cain) Date: Thu, 20 Feb 2020 18:16:10 -0800 Subject: [maker-devel] GMOD in Google Summer of Code Message-ID: Hello, I am very pleased to announce that GMOD in conjunction with Reactome, Galaxy and OICR/WormBase, together forming Open Genome Informatics, has been accepted for the Google Summer of Code. If you or someone you know might be a student interested in participating in GSoC, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2020 where there are proposed projects that cover a fair number of technologies. Official proposals from students will be due in mid March (more on that later). But WAIT! There's more: if you might be interested in being a mentor and working with a student this summer, it's not too late! You can add new project ideas to the page above (contact me if you need an account), or you can even volunteer to add yourself to one of the existing ideas as a potential mentor. Please feel free to forward this to other mailing lists or people who might be interested. We are already an eclectic, dispersed group, so everyone is welcome. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:05:31 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:05:31 -0700 Subject: [maker-devel] Unassembled RNA-Seq data to Maker In-Reply-To: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> References: <9D5BC5EA-A69C-439E-85FF-2BBBCA74B8F3@canada.ca> Message-ID: MAKER does not assemble the reads. It uses BLAST to align a sequence and then exonerate to polish around splice sites. This allows identification of introns (exons aren?t as useful for gene prediction hints). Unassembled reads will more likely align spuriously, will not cross splice sites (unless for intron identification), and will not be assigned to the proper strand (intron aware alignments allow proper strand assignment). MAKER was developed when older EST technology was the only option, mRNA-seq can be treated the same if it is assembled first. ?Carson > On Feb 20, 2020, at 2:42 PM, Soomro, Tayab (AAFC/AAC) wrote: > > I am wondering why it is required for the RNA-Seq data to be assembled when passed to Maker and what would happen if I pass non-assembled Illumina RNA-Seq data. > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Feb 26 12:09:58 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:09:58 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: Message-ID: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. ?Carson > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. > > Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? > > In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). > However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:10:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:10:59 -0700 Subject: [maker-devel] Maker Issue re-annotating In-Reply-To: References: Message-ID: <0546CBA9-9EB4-45B0-BB02-888E2F1B8AA9@gmail.com> Sorry for the slow reply. Please capture and send the STDERR from one of the failures. ?Carson > On Feb 11, 2020, at 9:12 AM, Megan Breitbach wrote: > > Good morning, > > I'm trying to de novo annotate a genome with ~100,000 scaffolds and a scaffold N50 of 189,900 using Maker. I've been able to use MPICH to parallelize the first round of > Here are the parameters used in the maker_opts.ctl file- > > #-----Genome (these are always required) > genome=blackbear_DNAzoo.FINAL.fasta #genome sequence (fasta file or fasta embeded in GFF3 file) > organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic > > #-----Re-annotation Using MAKER Derived GFF3 > maker_gff=blackbear_DNAzoo.FINAL.all.gff #MAKER derived GFF3 file > est_pass=1 #use ESTs in maker_gff: 1 = yes, 0 = no > altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no > protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no > rm_pass=1 #use repeats in maker_gff: 1 = yes, 0 = no > model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no > pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no > other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no > > #-----EST Evidence (for best results provide a file for at least one) > est=Ursus_maritimus.UrsMar_1.0.cdna.all.fa #set of ESTs or assembled mRNA-seq in fasta format > altest= #EST/cDNA sequence file in fasta format from an alternate organism > est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file > altest_gff= #aligned ESTs from a closly relate species in GFF3 format > > #-----Protein Homology Evidence (for best results provide a file for at least one) > protein=Ursus_maritimus.UrsMar_1.0.pep.all.fa #protein sequence file in fasta format (i.e. from mutiple organisms) > protein_gff= #aligned protein homology evidence from an external GFF3 file > > #-----Repeat Masking (leave values blank to skip repeat masking) > model_org=all #select a model organism for RepBase masking in RepeatMasker > rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker > repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner > rm_gff= #pre-identified repeat elements from an external GFF3 file > prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no > softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) > > #-----Gene Prediction > snaphmm=blackbear.hmm #SNAP HMM file > gmhmm= #GeneMark HMM file > augustus_species= #Augustus gene prediction species model > fgenesh_par_file= #FGENESH parameter file > pred_gff= #ab-initio predictions from an external GFF3 file > model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) > run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no > est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no > protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no > trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no > snoscan_rrna= #rRNA file to have Snoscan find snoRNAs > snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs > unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no > allow_overlap=0 #allowed gene overlap fraction (value from 0 to 1, blank for default) > > #-----Other Annotation Feature Types (features MAKER doesn't recognize) > other_gff= #extra features to pass-through to final MAKER generated GFF3 file > > #-----External Application Behavior Options > alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases > cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) > > #-----MAKER Behavior Options > max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) > min_contig=1 #skip genome contigs below this length (under 10kb are often useless) > > pred_flank=200 #flank for extending evidence clusters sent to gene predictors > pred_stats=1 #report AED and QI statistics for all predictions as well as models > AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) > min_protein=0 #require at least this many amino acids in predicted proteins > alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no > always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no > map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no > keep_preds=1 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) > > split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments) > min_intron=20 #minimum intron length (used for alignment polishing) > single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no > single_length=250 #min length required for single exon ESTs if 'single_exon is enabled' > correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes > > tries=2 #number of times to try a contig if there is a failure for some reason > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no > TMP= #specify a directory other than the system default temporary directory for temporary files > > Thanks, > -- > Megan Ramaker, PhD > Postdoctoral Trainee > HudsonAlpha Institute for Biotechnology > 601 Genome Way > Huntsville, AL 35806 > 478-284-6723 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:19:59 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:19:59 -0700 Subject: [maker-devel] Alternative splicing in MAKER In-Reply-To: References: Message-ID: est2genome=1 together with alt_splice=1 can cause weird behavior, because est2genome is just a cut and paste of an alignemnt to being a gene model, it will always be 100% supported by the evidence (itself as an alignment), and anything that overlaps will be clustered to being the same gene which can be messy if models you are moving forward align to multiple locations. You can add est_forward=1 (manually add it, it?s undocumented) to maker_opts.ctl to get MAKER to do a few extra behaviors. It will keep the names from the est2genome alignments (not rename them to maker names), and if you add hints like gene_id= to the fasta header it will only cluster things with the same gene ID and not just cluster by overlap. Also you can add maker_coor= to the header to restrict alignments to specific contigs or even contig regions. ?Carson > On Feb 9, 2020, at 3:24 AM, Lior Glick wrote: > > Hello, > I am working on a computational pipeline which involves genome annotation. Based on helpful advice I got in this mailing list before, I make two consecutive runs: the first is a liftover run with est2genome=1 and no ab-initio prediction, while the second run takes liftover results and adds ab-initio predictions, supported by protein and transcript evidence. > In both runs, I get results which I find confusing regarding alternative splice variants prediction, but the behavior is different in each run. > > In the liftover run, I use est2genome=1, alt_splice=1 and no ab-initio preduction. > The resulting gff indicates many overlapping genes, coming from ESTs (transcripts actually) of different splice products of the same gene. Of course MAKER has no way to know that, but I was expecting that since the genes are highly overlapping, they will be grouped together as different mRNA features under the same gene. > In the second run, I use est2genome=0, alt_splice=1 and Augustus for gene prediction. Results of the liftover run are provided to the pred_gff parameter. In this case, it seems that overlapping genes are squished together, so I only get one gene with one mRNA. > Please find attached maker_opts.ctl files for both runs, and GFF files demonstrating the issue (one gene example). > > Could anyone please explain how this works? Why is the behavior different between the runs? Any way to get MAKER to behave the way I expected? > > Thanks a lot! > Lior > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:27:43 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:27:43 -0700 Subject: [maker-devel] Multiple UTR ? In-Reply-To: References: Message-ID: Sorry for the very slow reply. I found this way way down in my inbox. The UTR features are the parts of the exons that are not CDS. So multiple UTR, means it spans multiple exons, and must assembled to generate the full UTR in a browser. Any exon that is fully non-coding will produce a UTR feature that mirrors an exons coordinates, and if it?s partially coding the UTR will share the same start or end by will terminate somewhere in the middle with a CDS filling up the remains coordinates. The UTR and CDS features get tiled over the top of the exon features when assembling a gene model. ?Carson > On Dec 18, 2019, at 7:19 AM, Patrick Tran Van wrote: > > Hi Carson, > > I have seen something strange in my annotation: multiple UTR. How can we explain this ? Thanks! > > Scaffold maker > mRNA 12117462 > 12128433 . > - . ID=GENE_02395-RA;Parent=GENE_02395;Name=GENE_02395-RA;Alias=maker-Scaffold-augustus-gene-40.12-mRNA-3;_AED=0.02;_QI=5383|1|1|1|0.88|0.9|10|247|238;_eAED=0.02;Note=Protein of unknown function; > Scaffold maker > exon 12128112 > 12128433 . > - . ID=GENE_02395-RA:exon:571;Parent=GENE_02395-RA; > Scaffold maker > exon 12117462 > 12118046 . > - . ID=GENE_02395-RB:exon:569;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118141 > 12118301 . > - . ID=GENE_02395-RB:exon:568;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12118386 > 12118539 . > - . ID=GENE_02395-RB:exon:567;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12118818 > 12122493 . > - . ID=GENE_02395-RB:exon:566;Parent=GENE_02395-RB,GENE_02395-RA; > Scaffold maker > exon 12123591 > 12123893 . > - . ID=GENE_02395-RB:exon:565;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12123995 > 12124303 . > - . ID=GENE_02395-RB:exon:564;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12125119 > 12125418 . > - . ID=GENE_02395-RB:exon:563;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12126005 > 12126313 . > - . ID=GENE_02395-RB:exon:562;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > exon 12127460 > 12127687 . > - . ID=GENE_02395-RB:exon:561;Parent=GENE_02395-RB,GENE_02395-RC,GENE_02395-RA; > Scaffold maker > five_prime_UTR 12128112 > 12128433 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12127460 > 12127687 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12126005 > 12126313 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12125119 > 12125418 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123995 > 12124303 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12123591 > 12123893 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > five_prime_UTR 12118882 > 12122493 . > - . ID=GENE_02395-RA:five_prime_utr;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118818 > 12118881 . > - 0 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118386 > 12118539 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12118141 > 12118301 . > - 1 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > CDS 12117709 > 12118046 . > - 2 ID=GENE_02395-RA:cds;Parent=GENE_02395-RA; > Scaffold maker > three_prime_UTR 12117462 > 12117708 . > - . ID=GENE_02395-RA:three_prime_utr;Parent=GENE_02395-RA; > > > > Patrick Tran Van > > Bioinformatician: Lab Chapuisat & Schwander > Department of Ecology and Evolution > University of Lausanne > Lausanne - Switzerland > Office 3206 > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 12:54:32 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 12:54:32 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). Example ?> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. ?Carson > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. > I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: > If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >> >> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >> >> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Wed Feb 26 12:36:25 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 14:36:25 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> Message-ID: Thanks very much for the reply Carson, I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. Appreciate your insights; hope the weather in UT is filled with sun or snow or both. Devon On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > If running under MPI, the reason for a failure may be further back in the > STDERR (failures tend snowball other failures, so the initial cause is > often way back). If you can capture the STDERR and send it, that would be > the most informative. If its memory, you can also set all the blast_depth > parameters in maker_botpts.ctl to a value like 20. > > ?Carson > > > > On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: > > Hello, > > I apologize for not posting directly to the archived forum but it appears > that the option to enter new posts is disabled. Perhaps this is by design > so emails go directly to this address. I hope this is what you are looking > for. > > Thank you for your continued support of Maker and your responses to the > forum posts. I have been running Maker (V3.01.02-beta) to annotate a > mammalian genome that consists of 22 chromosome-length scaffolds (between > ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. > In my various tests in running Maker, the vast majority of the smaller > fragments are annotated successfully, but nearly all the large scaffolds > fail with the same error code when I look at the 'run.log.child.0' file: > ``` > DIED RANK 0:6:0:0 > DIED COUNT 2 > ``` > (the master 'run.log' file just shows "DIED COUNT 2") > > I struggled to find this exact error code anywhere on the forum and was > hoping you might be able to help me determine where I should start > troubleshooting. I thought perhaps it was an error concerning memory > requirements, so I altered the chunk size from the default to a few larger > sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the > same outcome). I've tried running the program with parallel support using > either openMPI or mpich. I've tried running on a single node using 24 cpus > and 120g of RAM. It always stalls at the same step. > > Interestingly, one of the 22 large scaffolds always finishes and produces > the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but > the other 21 of 22 large scaffolds fail. This makes me think perhaps it's > not a memory issue? > > In the case of both the completed and failed scaffolds, the > "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, > .specific.ori.out, .specific.cat.gz, .specific.out, > te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest > *fasta.tblastx, and protein *fasta.blastx files are all present (and appear > finished from what I can tell). > However, the particular contents in the parent directory to the > "theVoid.scaffold" folder differ. For the failed scaffolds, the contents > generally always look something like this (that is, they stall with the > same kind of files produced): > ``` > 0 > evidence_0.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > scaffold22.0.final.section > scaffold22.0.pred.raw.section > scaffold22.0.raw.section > scaffold22.gff.ann > scaffold22.gff.def > scaffold22.gff.seq > ``` > > For the completed scaffold, there are many more files created: > ``` > 0 > 10 > 100 > 20 > 30 > 40 > 50 > 60 > 70 > 80 > 90 > evidence_0.gff > evidence_10.gff > evidence_1.gff > evidence_2.gff > evidence_3.gff > evidence_4.gff > evidence_5.gff > evidence_6.gff > evidence_7.gff > evidence_8.gff > evidence_9.gff > query.fasta > query.masked.fasta > query.masked.fasta.index > query.masked.gff > run.log.child.0 > run.log.child.1 > run.log.child.10 > run.log.child.2 > run.log.child.3 > run.log.child.4 > run.log.child.5 > run.log.child.6 > run.log.child.7 > run.log.child.8 > run.log.child.9 > scaffold4.0-1.raw.section > scaffold4.0.final.section > scaffold4.0.pred.raw.section > scaffold4.0.raw.section > scaffold4.10.final.section > scaffold4.10.pred.raw.section > scaffold4.10.raw.section > scaffold4.1-2.raw.section > scaffold4.1.final.section > scaffold4.1.pred.raw.section > scaffold4.1.raw.section > scaffold4.2-3.raw.section > scaffold4.2.final.section > scaffold4.2.pred.raw.section > scaffold4.2.raw.section > scaffold4.3-4.raw.section > scaffold4.3.final.section > scaffold4.3.pred.raw.section > scaffold4.3.raw.section > scaffold4.4-5.raw.section > scaffold4.4.final.section > scaffold4.4.pred.raw.section > scaffold4.4.raw.section > scaffold4.5-6.raw.section > scaffold4.5.final.section > scaffold4.5.pred.raw.section > scaffold4.5.raw.section > scaffold4.6-7.raw.section > scaffold4.6.final.section > scaffold4.6.pred.raw.section > scaffold4.6.raw.section > scaffold4.7-8.raw.section > scaffold4.7.final.section > scaffold4.7.pred.raw.section > scaffold4.7.raw.section > scaffold4.8-9.raw.section > scaffold4.8.final.section > scaffold4.8.pred.raw.section > scaffold4.8.raw.section > scaffold4.9-10.raw.section > scaffold4.9.final.section > scaffold4.9.pred.raw.section > scaffold4.9.raw.section > ``` > > Thanks for any troubleshooting tips you can offer. > > Cheers, > Devon > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1a.log.gz Type: application/x-gzip Size: 21751 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fail-1b.log.gz Type: application/x-gzip Size: 2175 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_maker_opts.ctl Type: application/octet-stream Size: 3720 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run1_slurm.sh Type: application/x-sh Size: 788 bytes Desc: not available URL: From devon.orourke at gmail.com Wed Feb 26 13:15:08 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Wed, 26 Feb 2020 15:15:08 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: Much appreciated Carson, I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). Thanks, Devon On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > Try adding these a few options right after ?mpiexec? in your batch script > (this will fix infiniband related segfaults as well as some fork related > segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for > subprocesses in the STDERR (allows you to run some commands outside of > MAKER to test the source of failures if for example BLASt or Exonerate is > causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca > orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca > mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu > -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few > systems where it has an issue with perl itself, and the only way to get > around it is to install your own version of perl without perl threads > enabled and install MAKER with that version of Perl (then OpenMPI seems to > be ok again). If that?s the case it is often easier to switch to MPICH2 or > Intel MPI as the MPI launcher if they are available and then reinstall > MAKER with that MPI flavor. > > ?Carson > > > > On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: > > Thanks very much for the reply Carson, > I've attached few files file of the most recently failed run: the shell > script submitted to Slurm, the _opts.ctl file, and the pair of log files > generated from the job. The reason there are a 1a and 1b pair of files is > that I had initially set the number of cpus in the _opts.ctl file to "60", > but then tried re-running it after setting it to "28". Both seem to have > the same result. > I certainly have access to more memory if needed. I'm using a pretty > typical (I think?) cluster that controls jobs with Slurm using a Lustre > file system - it's the main high performance computing center at our > university. I have access to plenty of nodes that contain about 120-150g of > RAM each with between 24-28 cpus each, as well a handful of higher memory > nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a > similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over > 32 cpus; if that fails, I could certainly run again with even more memory. > Appreciate your insights; hope the weather in UT is filled with sun or > snow or both. > Devon > > On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: > >> If running under MPI, the reason for a failure may be further back in the >> STDERR (failures tend snowball other failures, so the initial cause is >> often way back). If you can capture the STDERR and send it, that would be >> the most informative. If its memory, you can also set all the blast_depth >> parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >> wrote: >> >> Hello, >> >> I apologize for not posting directly to the archived forum but it appears >> that the option to enter new posts is disabled. Perhaps this is by design >> so emails go directly to this address. I hope this is what you are looking >> for. >> >> Thank you for your continued support of Maker and your responses to the >> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >> mammalian genome that consists of 22 chromosome-length scaffolds (between >> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >> In my various tests in running Maker, the vast majority of the smaller >> fragments are annotated successfully, but nearly all the large scaffolds >> fail with the same error code when I look at the 'run.log.child.0' file: >> ``` >> DIED RANK 0:6:0:0 >> DIED COUNT 2 >> ``` >> (the master 'run.log' file just shows "DIED COUNT 2") >> >> I struggled to find this exact error code anywhere on the forum and was >> hoping you might be able to help me determine where I should start >> troubleshooting. I thought perhaps it was an error concerning memory >> requirements, so I altered the chunk size from the default to a few larger >> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >> same outcome). I've tried running the program with parallel support using >> either openMPI or mpich. I've tried running on a single node using 24 cpus >> and 120g of RAM. It always stalls at the same step. >> >> Interestingly, one of the 22 large scaffolds always finishes and produces >> the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but >> the other 21 of 22 large scaffolds fail. This makes me think perhaps it's >> not a memory issue? >> >> In the case of both the completed and failed scaffolds, the >> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >> .specific.ori.out, .specific.cat.gz, .specific.out, >> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >> finished from what I can tell). >> However, the particular contents in the parent directory to the >> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >> generally always look something like this (that is, they stall with the >> same kind of files produced): >> ``` >> 0 >> evidence_0.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> scaffold22.0.final.section >> scaffold22.0.pred.raw.section >> scaffold22.0.raw.section >> scaffold22.gff.ann >> scaffold22.gff.def >> scaffold22.gff.seq >> ``` >> >> For the completed scaffold, there are many more files created: >> ``` >> 0 >> 10 >> 100 >> 20 >> 30 >> 40 >> 50 >> 60 >> 70 >> 80 >> 90 >> evidence_0.gff >> evidence_10.gff >> evidence_1.gff >> evidence_2.gff >> evidence_3.gff >> evidence_4.gff >> evidence_5.gff >> evidence_6.gff >> evidence_7.gff >> evidence_8.gff >> evidence_9.gff >> query.fasta >> query.masked.fasta >> query.masked.fasta.index >> query.masked.gff >> run.log.child.0 >> run.log.child.1 >> run.log.child.10 >> run.log.child.2 >> run.log.child.3 >> run.log.child.4 >> run.log.child.5 >> run.log.child.6 >> run.log.child.7 >> run.log.child.8 >> run.log.child.9 >> scaffold4.0-1.raw.section >> scaffold4.0.final.section >> scaffold4.0.pred.raw.section >> scaffold4.0.raw.section >> scaffold4.10.final.section >> scaffold4.10.pred.raw.section >> scaffold4.10.raw.section >> scaffold4.1-2.raw.section >> scaffold4.1.final.section >> scaffold4.1.pred.raw.section >> scaffold4.1.raw.section >> scaffold4.2-3.raw.section >> scaffold4.2.final.section >> scaffold4.2.pred.raw.section >> scaffold4.2.raw.section >> scaffold4.3-4.raw.section >> scaffold4.3.final.section >> scaffold4.3.pred.raw.section >> scaffold4.3.raw.section >> scaffold4.4-5.raw.section >> scaffold4.4.final.section >> scaffold4.4.pred.raw.section >> scaffold4.4.raw.section >> scaffold4.5-6.raw.section >> scaffold4.5.final.section >> scaffold4.5.pred.raw.section >> scaffold4.5.raw.section >> scaffold4.6-7.raw.section >> scaffold4.6.final.section >> scaffold4.6.pred.raw.section >> scaffold4.6.raw.section >> scaffold4.7-8.raw.section >> scaffold4.7.final.section >> scaffold4.7.pred.raw.section >> scaffold4.7.raw.section >> scaffold4.8-9.raw.section >> scaffold4.8.final.section >> scaffold4.8.pred.raw.section >> scaffold4.8.raw.section >> scaffold4.9-10.raw.section >> scaffold4.9.final.section >> scaffold4.9.pred.raw.section >> scaffold4.9.raw.section >> ``` >> >> Thanks for any troubleshooting tips you can offer. >> >> Cheers, >> Devon >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Feb 26 13:18:34 2020 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 26 Feb 2020 13:18:34 -0700 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> Message-ID: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> For Intel MPI, export an environmental variable right before running MAKER ?> "export I_MPI_FABRICS=shm:tcp" Intel MPI has a similar infiniband segfault issue as OpenMPI when running Perl scripts, but a different workaround. ?Carson > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post the outcome. We definitely have two of three MPI options you've described on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to advise my cluster admins to use whichever software you prefer (should there be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt > wrote: > Try adding these a few options right after ?mpiexec? in your batch script (this will fix infiniband related segfaults as well as some fork related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 > > Also remove the -q in the maker command to get full command lines for subprocesses in the STDERR (allows you to run some commands outside of MAKER to test the source of failures if for example BLASt or Exonerate is causing the segfault). > > Example ?> > mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu -fix_nucleotides > > > One alternate possibility is that OpenMPI is the problem, I?ve seen a few systems where it has an issue with perl itself, and the only way to get around it is to install your own version of perl without perl threads enabled and install MAKER with that version of Perl (then OpenMPI seems to be ok again). If that?s the case it is often easier to switch to MPICH2 or Intel MPI as the MPI launcher if they are available and then reinstall MAKER with that MPI flavor. > > ?Carson > > > >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke > wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell script submitted to Slurm, the _opts.ctl file, and the pair of log files generated from the job. The reason there are a 1a and 1b pair of files is that I had initially set the number of cpus in the _opts.ctl file to "60", but then tried re-running it after setting it to "28". Both seem to have the same result. >> I certainly have access to more memory if needed. I'm using a pretty typical (I think?) cluster that controls jobs with Slurm using a Lustre file system - it's the main high performance computing center at our university. I have access to plenty of nodes that contain about 120-150g of RAM each with between 24-28 cpus each, as well a handful of higher memory nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt > wrote: >> If running under MPI, the reason for a failure may be further back in the STDERR (failures tend snowball other failures, so the initial cause is often way back). If you can capture the STDERR and send it, that would be the most informative. If its memory, you can also set all the blast_depth parameters in maker_botpts.ctl to a value like 20. >> >> ?Carson >> >> >> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke > wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it appears that the option to enter new posts is disabled. Perhaps this is by design so emails go directly to this address. I hope this is what you are looking for. >>> >>> Thank you for your continued support of Maker and your responses to the forum posts. I have been running Maker (V3.01.02-beta) to annotate a mammalian genome that consists of 22 chromosome-length scaffolds (between ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. In my various tests in running Maker, the vast majority of the smaller fragments are annotated successfully, but nearly all the large scaffolds fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was hoping you might be able to help me determine where I should start troubleshooting. I thought perhaps it was an error concerning memory requirements, so I altered the chunk size from the default to a few larger sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the same outcome). I've tried running the program with parallel support using either openMPI or mpich. I've tried running on a single node using 24 cpus and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff files, but the other 21 of 22 large scaffolds fail. This makes me think perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, .specific.ori.out, .specific.cat.gz, .specific.out, te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest *fasta.tblastx, and protein *fasta.blastx files are all present (and appear finished from what I can tell). >>> However, the particular contents in the parent directory to the "theVoid.scaffold" folder differ. For the failed scaffolds, the contents generally always look something like this (that is, they stall with the same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> > > > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Fri Feb 28 05:50:27 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Fri, 28 Feb 2020 07:50:27 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, I had previously tried sending this email yesterday but received a notification about the text body size being too large. I thought perhaps it was related to the attached log file I sent in the earlier message. You can see the same file here: https://osf.io/cuxg8/download. Thanks! (previous message below) .... Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Sat Feb 29 10:27:16 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Sat, 29 Feb 2020 12:27:16 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi once again Carson, Our administrators tried installing Maker with a different version of OpenMPI, and the change allowed the job to complete normally. The change was from a newer version (3.1.3) to an older version (1.6.5) of OpenMPI. I needed to make one tweak to the various MPI arguments you provided after that downgrade in version number, as v-1.6.5 didn't use Vader yet. Other than that, the terms appeared to allow the job to run to completion. Thanks for your assistance, Devon On Fri, Feb 28, 2020 at 7:50 AM Devon O'Rourke wrote: > Hi Carson, > I had previously tried sending this email yesterday but received a > notification about the text body size being too large. I thought perhaps it > was related to the attached log file I sent in the earlier message. You can > see the same file here: https://osf.io/cuxg8/download. > Thanks! > > (previous message below) > > .... > > Two steps forward, one step back, I suppose? > After incorporating the additional MPI-related parameters the job moved > further ahead than previous iterations, however it still failed prior to > completing the job. It appears that all but the six longest scaffolds were > annotated (except for a small few short scaffolds which simply weren't > finished by the time the error triggered the entire run to stop). > I've attached the .log file in hopes that you might find any additional > nuggets to help diagnose the problem. Very much appreciate your help. > Devon > > On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > >> For Intel MPI, export an environmental variable right before running >> MAKER ?> "export I_MPI_FABRICS=shm:tcp" >> >> Intel MPI has a similar infiniband segfault issue as OpenMPI when running >> Perl scripts, but a different workaround. >> >> ?Carson >> >> >> On Feb 26, 2020, at 1:15 PM, Devon O'Rourke >> wrote: >> >> Much appreciated Carson, >> I've submitted a job using the parameters you've suggested and will post >> the outcome. We definitely have two of three MPI options you've described >> on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to >> advise my cluster admins to use whichever software you prefer (should there >> be one). >> Thanks, >> Devon >> >> On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: >> >>> Try adding these a few options right after ?mpiexec? in your batch >>> script (this will fix infiniband related segfaults as well as some fork >>> related segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include >>> ib0 --mca orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 >>> --mca mpi_warn_on_fork 0 >>> >>> Also remove the -q in the maker command to get full command lines for >>> subprocesses in the STDERR (allows you to run some commands outside of >>> MAKER to test the source of failures if for example BLASt or Exonerate is >>> causing the segfault). >>> >>> Example ?> >>> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >>> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >>> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base >>> lu -fix_nucleotides >>> >>> >>> One alternate possibility is that OpenMPI is the problem, I?ve seen a >>> few systems where it has an issue with perl itself, and the only way to get >>> around it is to install your own version of perl without perl threads >>> enabled and install MAKER with that version of Perl (then OpenMPI seems to >>> be ok again). If that?s the case it is often easier to switch to MPICH2 or >>> Intel MPI as the MPI launcher if they are available and then reinstall >>> MAKER with that MPI flavor. >>> >>> ?Carson >>> >>> >>> >>> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >>> wrote: >>> >>> Thanks very much for the reply Carson, >>> I've attached few files file of the most recently failed run: the shell >>> script submitted to Slurm, the _opts.ctl file, and the pair of log files >>> generated from the job. The reason there are a 1a and 1b pair of files is >>> that I had initially set the number of cpus in the _opts.ctl file to "60", >>> but then tried re-running it after setting it to "28". Both seem to have >>> the same result. >>> I certainly have access to more memory if needed. I'm using a pretty >>> typical (I think?) cluster that controls jobs with Slurm using a Lustre >>> file system - it's the main high performance computing center at our >>> university. I have access to plenty of nodes that contain about 120-150g of >>> RAM each with between 24-28 cpus each, as well a handful of higher memory >>> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >>> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >>> 32 cpus; if that fails, I could certainly run again with even more memory. >>> Appreciate your insights; hope the weather in UT is filled with sun or >>> snow or both. >>> Devon >>> >>> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >>> >>>> If running under MPI, the reason for a failure may be further back in >>>> the STDERR (failures tend snowball other failures, so the initial cause is >>>> often way back). If you can capture the STDERR and send it, that would be >>>> the most informative. If its memory, you can also set all the blast_depth >>>> parameters in maker_botpts.ctl to a value like 20. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>>> wrote: >>>> >>>> Hello, >>>> >>>> I apologize for not posting directly to the archived forum but it >>>> appears that the option to enter new posts is disabled. Perhaps this is by >>>> design so emails go directly to this address. I hope this is what you are >>>> looking for. >>>> >>>> Thank you for your continued support of Maker and your responses to the >>>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>>> In my various tests in running Maker, the vast majority of the smaller >>>> fragments are annotated successfully, but nearly all the large scaffolds >>>> fail with the same error code when I look at the 'run.log.child.0' file: >>>> ``` >>>> DIED RANK 0:6:0:0 >>>> DIED COUNT 2 >>>> ``` >>>> (the master 'run.log' file just shows "DIED COUNT 2") >>>> >>>> I struggled to find this exact error code anywhere on the forum and was >>>> hoping you might be able to help me determine where I should start >>>> troubleshooting. I thought perhaps it was an error concerning memory >>>> requirements, so I altered the chunk size from the default to a few larger >>>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>>> same outcome). I've tried running the program with parallel support using >>>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>>> and 120g of RAM. It always stalls at the same step. >>>> >>>> Interestingly, one of the 22 large scaffolds always finishes and >>>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>>> perhaps it's not a memory issue? >>>> >>>> In the case of both the completed and failed scaffolds, the >>>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>>> .specific.ori.out, .specific.cat.gz, .specific.out, >>>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>>> finished from what I can tell). >>>> However, the particular contents in the parent directory to the >>>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>>> generally always look something like this (that is, they stall with the >>>> same kind of files produced): >>>> ``` >>>> 0 >>>> evidence_0.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> scaffold22.0.final.section >>>> scaffold22.0.pred.raw.section >>>> scaffold22.0.raw.section >>>> scaffold22.gff.ann >>>> scaffold22.gff.def >>>> scaffold22.gff.seq >>>> ``` >>>> >>>> For the completed scaffold, there are many more files created: >>>> ``` >>>> 0 >>>> 10 >>>> 100 >>>> 20 >>>> 30 >>>> 40 >>>> 50 >>>> 60 >>>> 70 >>>> 80 >>>> 90 >>>> evidence_0.gff >>>> evidence_10.gff >>>> evidence_1.gff >>>> evidence_2.gff >>>> evidence_3.gff >>>> evidence_4.gff >>>> evidence_5.gff >>>> evidence_6.gff >>>> evidence_7.gff >>>> evidence_8.gff >>>> evidence_9.gff >>>> query.fasta >>>> query.masked.fasta >>>> query.masked.fasta.index >>>> query.masked.gff >>>> run.log.child.0 >>>> run.log.child.1 >>>> run.log.child.10 >>>> run.log.child.2 >>>> run.log.child.3 >>>> run.log.child.4 >>>> run.log.child.5 >>>> run.log.child.6 >>>> run.log.child.7 >>>> run.log.child.8 >>>> run.log.child.9 >>>> scaffold4.0-1.raw.section >>>> scaffold4.0.final.section >>>> scaffold4.0.pred.raw.section >>>> scaffold4.0.raw.section >>>> scaffold4.10.final.section >>>> scaffold4.10.pred.raw.section >>>> scaffold4.10.raw.section >>>> scaffold4.1-2.raw.section >>>> scaffold4.1.final.section >>>> scaffold4.1.pred.raw.section >>>> scaffold4.1.raw.section >>>> scaffold4.2-3.raw.section >>>> scaffold4.2.final.section >>>> scaffold4.2.pred.raw.section >>>> scaffold4.2.raw.section >>>> scaffold4.3-4.raw.section >>>> scaffold4.3.final.section >>>> scaffold4.3.pred.raw.section >>>> scaffold4.3.raw.section >>>> scaffold4.4-5.raw.section >>>> scaffold4.4.final.section >>>> scaffold4.4.pred.raw.section >>>> scaffold4.4.raw.section >>>> scaffold4.5-6.raw.section >>>> scaffold4.5.final.section >>>> scaffold4.5.pred.raw.section >>>> scaffold4.5.raw.section >>>> scaffold4.6-7.raw.section >>>> scaffold4.6.final.section >>>> scaffold4.6.pred.raw.section >>>> scaffold4.6.raw.section >>>> scaffold4.7-8.raw.section >>>> scaffold4.7.final.section >>>> scaffold4.7.pred.raw.section >>>> scaffold4.7.raw.section >>>> scaffold4.8-9.raw.section >>>> scaffold4.8.final.section >>>> scaffold4.8.pred.raw.section >>>> scaffold4.8.raw.section >>>> scaffold4.9-10.raw.section >>>> scaffold4.9.final.section >>>> scaffold4.9.pred.raw.section >>>> scaffold4.9.raw.section >>>> ``` >>>> >>>> Thanks for any troubleshooting tips you can offer. >>>> >>>> Cheers, >>>> Devon >>>> >>>> -- >>>> Devon O'Rourke >>>> Postdoctoral researcher, Northern Arizona University >>>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>>> twitter: @thesciencedork >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at yandell-lab.org >>>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: From devon.orourke at gmail.com Thu Feb 27 06:26:20 2020 From: devon.orourke at gmail.com (Devon O'Rourke) Date: Thu, 27 Feb 2020 08:26:20 -0500 Subject: [maker-devel] short scaffolds finish, long scaffolds (almost always) fail In-Reply-To: <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> References: <55669676-819C-42D0-B5C2-82F2098BF946@gmail.com> <2A887181-113E-46D2-8113-FDF24CC64A2A@gmail.com> <34FA51F8-004F-4EFE-B4D5-AB86116FCAC3@gmail.com> Message-ID: Hi Carson, Two steps forward, one step back, I suppose? After incorporating the additional MPI-related parameters the job moved further ahead than previous iterations, however it still failed prior to completing the job. It appears that all but the six longest scaffolds were annotated (except for a small few short scaffolds which simply weren't finished by the time the error triggered the entire run to stop). I've attached the .log file in hopes that you might find any additional nuggets to help diagnose the problem. Very much appreciate your help. Devon On Wed, Feb 26, 2020 at 3:18 PM Carson Holt wrote: > For Intel MPI, export an environmental variable right before running MAKER > ?> "export I_MPI_FABRICS=shm:tcp" > > Intel MPI has a similar infiniband segfault issue as OpenMPI when running > Perl scripts, but a different workaround. > > ?Carson > > > On Feb 26, 2020, at 1:15 PM, Devon O'Rourke > wrote: > > Much appreciated Carson, > I've submitted a job using the parameters you've suggested and will post > the outcome. We definitely have two of three MPI options you've described > on our cluster (OpenMPI and MPICH2); I'll check on Intel MPI. Happy to > advise my cluster admins to use whichever software you prefer (should there > be one). > Thanks, > Devon > > On Wed, Feb 26, 2020 at 2:54 PM Carson Holt wrote: > >> Try adding these a few options right after ?mpiexec? in your batch script >> (this will fix infiniband related segfaults as well as some fork related >> segfaults) ?> --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 >> >> Also remove the -q in the maker command to get full command lines for >> subprocesses in the STDERR (allows you to run some commands outside of >> MAKER to test the source of failures if for example BLASt or Exonerate is >> causing the segfault). >> >> Example ?> >> mpiexec --mca btl vader,tcp,self --mca btl_tcp_if_include ib0 --mca >> orte_base_help_aggregate 0 --mca btl_openib_want_fork_support 1 --mca >> mpi_warn_on_fork 0 -n 28 /packages/maker/3.01.02-beta/bin/maker -base lu >> -fix_nucleotides >> >> >> One alternate possibility is that OpenMPI is the problem, I?ve seen a few >> systems where it has an issue with perl itself, and the only way to get >> around it is to install your own version of perl without perl threads >> enabled and install MAKER with that version of Perl (then OpenMPI seems to >> be ok again). If that?s the case it is often easier to switch to MPICH2 or >> Intel MPI as the MPI launcher if they are available and then reinstall >> MAKER with that MPI flavor. >> >> ?Carson >> >> >> >> On Feb 26, 2020, at 12:36 PM, Devon O'Rourke >> wrote: >> >> Thanks very much for the reply Carson, >> I've attached few files file of the most recently failed run: the shell >> script submitted to Slurm, the _opts.ctl file, and the pair of log files >> generated from the job. The reason there are a 1a and 1b pair of files is >> that I had initially set the number of cpus in the _opts.ctl file to "60", >> but then tried re-running it after setting it to "28". Both seem to have >> the same result. >> I certainly have access to more memory if needed. I'm using a pretty >> typical (I think?) cluster that controls jobs with Slurm using a Lustre >> file system - it's the main high performance computing center at our >> university. I have access to plenty of nodes that contain about 120-150g of >> RAM each with between 24-28 cpus each, as well a handful of higher memory >> nodes with about 1.5tb of RAM. As I'm writing this email, I've submitted a >> similar Maker job (i.e. same fasta/gff inputs) requesting 200g of RAM over >> 32 cpus; if that fails, I could certainly run again with even more memory. >> Appreciate your insights; hope the weather in UT is filled with sun or >> snow or both. >> Devon >> >> On Wed, Feb 26, 2020 at 2:10 PM Carson Holt wrote: >> >>> If running under MPI, the reason for a failure may be further back in >>> the STDERR (failures tend snowball other failures, so the initial cause is >>> often way back). If you can capture the STDERR and send it, that would be >>> the most informative. If its memory, you can also set all the blast_depth >>> parameters in maker_botpts.ctl to a value like 20. >>> >>> ?Carson >>> >>> >>> >>> On Feb 19, 2020, at 1:54 PM, Devon O'Rourke >>> wrote: >>> >>> Hello, >>> >>> I apologize for not posting directly to the archived forum but it >>> appears that the option to enter new posts is disabled. Perhaps this is by >>> design so emails go directly to this address. I hope this is what you are >>> looking for. >>> >>> Thank you for your continued support of Maker and your responses to the >>> forum posts. I have been running Maker (V3.01.02-beta) to annotate a >>> mammalian genome that consists of 22 chromosome-length scaffolds (between >>> ~200-20Mb) and about 10,000 smaller fragments from 1Mb to 10kb in length. >>> In my various tests in running Maker, the vast majority of the smaller >>> fragments are annotated successfully, but nearly all the large scaffolds >>> fail with the same error code when I look at the 'run.log.child.0' file: >>> ``` >>> DIED RANK 0:6:0:0 >>> DIED COUNT 2 >>> ``` >>> (the master 'run.log' file just shows "DIED COUNT 2") >>> >>> I struggled to find this exact error code anywhere on the forum and was >>> hoping you might be able to help me determine where I should start >>> troubleshooting. I thought perhaps it was an error concerning memory >>> requirements, so I altered the chunk size from the default to a few larger >>> sequence lengths (I've tried 1e6, 1e7, and 999,999,999 - all produce the >>> same outcome). I've tried running the program with parallel support using >>> either openMPI or mpich. I've tried running on a single node using 24 cpus >>> and 120g of RAM. It always stalls at the same step. >>> >>> Interestingly, one of the 22 large scaffolds always finishes and >>> produces the .maker.proteins.fasta, .maker.transcripts.fasta, and .gff >>> files, but the other 21 of 22 large scaffolds fail. This makes me think >>> perhaps it's not a memory issue? >>> >>> In the case of both the completed and failed scaffolds, the >>> "theVoid.scaffoldX" subdirectory(ies) containing the .rb.cat.gz, .rb.out, >>> .specific.ori.out, .specific.cat.gz, .specific.out, >>> te_proteins*fasta.repeat runner, the est *fasta.blastn, the altest >>> *fasta.tblastx, and protein *fasta.blastx files are all present (and appear >>> finished from what I can tell). >>> However, the particular contents in the parent directory to the >>> "theVoid.scaffold" folder differ. For the failed scaffolds, the contents >>> generally always look something like this (that is, they stall with the >>> same kind of files produced): >>> ``` >>> 0 >>> evidence_0.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> scaffold22.0.final.section >>> scaffold22.0.pred.raw.section >>> scaffold22.0.raw.section >>> scaffold22.gff.ann >>> scaffold22.gff.def >>> scaffold22.gff.seq >>> ``` >>> >>> For the completed scaffold, there are many more files created: >>> ``` >>> 0 >>> 10 >>> 100 >>> 20 >>> 30 >>> 40 >>> 50 >>> 60 >>> 70 >>> 80 >>> 90 >>> evidence_0.gff >>> evidence_10.gff >>> evidence_1.gff >>> evidence_2.gff >>> evidence_3.gff >>> evidence_4.gff >>> evidence_5.gff >>> evidence_6.gff >>> evidence_7.gff >>> evidence_8.gff >>> evidence_9.gff >>> query.fasta >>> query.masked.fasta >>> query.masked.fasta.index >>> query.masked.gff >>> run.log.child.0 >>> run.log.child.1 >>> run.log.child.10 >>> run.log.child.2 >>> run.log.child.3 >>> run.log.child.4 >>> run.log.child.5 >>> run.log.child.6 >>> run.log.child.7 >>> run.log.child.8 >>> run.log.child.9 >>> scaffold4.0-1.raw.section >>> scaffold4.0.final.section >>> scaffold4.0.pred.raw.section >>> scaffold4.0.raw.section >>> scaffold4.10.final.section >>> scaffold4.10.pred.raw.section >>> scaffold4.10.raw.section >>> scaffold4.1-2.raw.section >>> scaffold4.1.final.section >>> scaffold4.1.pred.raw.section >>> scaffold4.1.raw.section >>> scaffold4.2-3.raw.section >>> scaffold4.2.final.section >>> scaffold4.2.pred.raw.section >>> scaffold4.2.raw.section >>> scaffold4.3-4.raw.section >>> scaffold4.3.final.section >>> scaffold4.3.pred.raw.section >>> scaffold4.3.raw.section >>> scaffold4.4-5.raw.section >>> scaffold4.4.final.section >>> scaffold4.4.pred.raw.section >>> scaffold4.4.raw.section >>> scaffold4.5-6.raw.section >>> scaffold4.5.final.section >>> scaffold4.5.pred.raw.section >>> scaffold4.5.raw.section >>> scaffold4.6-7.raw.section >>> scaffold4.6.final.section >>> scaffold4.6.pred.raw.section >>> scaffold4.6.raw.section >>> scaffold4.7-8.raw.section >>> scaffold4.7.final.section >>> scaffold4.7.pred.raw.section >>> scaffold4.7.raw.section >>> scaffold4.8-9.raw.section >>> scaffold4.8.final.section >>> scaffold4.8.pred.raw.section >>> scaffold4.8.raw.section >>> scaffold4.9-10.raw.section >>> scaffold4.9.final.section >>> scaffold4.9.pred.raw.section >>> scaffold4.9.raw.section >>> ``` >>> >>> Thanks for any troubleshooting tips you can offer. >>> >>> Cheers, >>> Devon >>> >>> -- >>> Devon O'Rourke >>> Postdoctoral researcher, Northern Arizona University >>> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >>> twitter: @thesciencedork >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at yandell-lab.org >>> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> -- >> Devon O'Rourke >> Postdoctoral researcher, Northern Arizona University >> Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ >> twitter: @thesciencedork >> >> >> >> > > -- > Devon O'Rourke > Postdoctoral researcher, Northern Arizona University > Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ > twitter: @thesciencedork > > > -- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LUmaker.log.gz Type: application/x-gzip Size: 4808331 bytes Desc: not available URL: From gongyuan.cao at duke.edu Sat Feb 29 10:44:24 2020 From: gongyuan.cao at duke.edu (Gongyuan Cao) Date: Sat, 29 Feb 2020 17:44:24 +0000 Subject: [maker-devel] maker_functional_gff error Message-ID: Hi, I'm running maker_functional_gff and got this error: Can't use string ("") as a HASH ref while "strict refs" in use at /root/maker/bin/maker_functional_gff line 55, <$IN> line 3. I've checked the gff file and there are no missing "ID=" tags, what could be the problem? head of blastpoutput: lacu_11543-RA A4GSN8 49.643 2099 951 36 1 2026 1 2066 0.0 1724 lacu_11544-RA F4IF36 75.473 1268 273 6 33 1263 29 1295 0.0 1949 lacu_11548-RA O81123 51.316 380 144 10 24 401 15 355 2.29e-119 353 lacu_11549-RA Q9SA32 60.767 339 130 3 328 664 58 395 1.54e-141 421 lacu_11547-RA Q9SLK2 72.493 349 96 0 1 349 1 349 0.0 518 lacu_11558-RA Q9LTV6 76.689 296 69 0 5 300 3 298 2.21e-158 446 lacu_11557-RA Q9C9U5 40.441 272 145 6 866 1134 746 1003 7.55e-50 196 lacu_11552-RA Q96GG9 44.715 246 128 3 58 296 2 246 2.30e-73 229 lacu_11560-RA Q42961 89.375 480 47 2 2 480 4 480 0.0 855 lacu_11561-RA Q42962 91.022 401 36 0 1 401 1 401 0.0 731 head of gff: ##gff-version 3 Linkage_group_5 . contig 1 30484050 . . . ID=Linkage_group_5;Name=Linkage_group_5 Linkage_group_5 maker gene 10601 29761 . + . ID=lacu_11543;Name=lacu_11543;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188;score=1168; Linkage_group_5 maker mRNA 10601 29761 6483 + . ID=lacu_11543-RA;Parent=lacu_11543;Name=lacu_11543-RA;Alias=maker-Linkage_group_5-pred_gff_est2genome-gene-0.188-mRNA-1;_AED=0.00;_QI=105|1|1|1|1|1|48|246|2043;_eAED=0.00;score=1168; Linkage_group_5 maker exon 10601 11011 . + . ID=lacu_11543-RA:exon:0;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11129 11275 . + . ID=lacu_11543-RA:exon:1;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11403 11501 . + . ID=lacu_11543-RA:exon:2;Parent=lacu_11543-RA; Linkage_group_5 maker exon 11835 11963 . + . ID=lacu_11543-RA:exon:3;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12054 12146 . + . ID=lacu_11543-RA:exon:4;Parent=lacu_11543-RA; Linkage_group_5 maker exon 12240 12305 . + . ID=lacu_11543-RA:exon:5;Parent=lacu_11543-RA; -------------- next part -------------- An HTML attachment was scrubbed... URL: