From carsonhh at gmail.com Mon Aug 4 15:27:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 04 Aug 2014 14:27:08 -0600 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: Sorry for the slow reply. I was on vacation all last week. Do you have the full STDERR? sometimes the last error is irrelevant and it's just the result of a failure further upstream. Also are you running 20 independent maker jobs simultaneously? --Carson From: Jan Philip Oeyen Date: Monday, July 28, 2014 at 6:22 AM To: Subject: [maker-devel] Forks.pm error when running maker with dsindex Hi all, we are currently having some unexpected errors when running maker on a genome which is split in several parts. Our cluster admin reported the following error message: Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm line 2188. SIGTERM received SIGTERM received SIGTERM received We were using maker with the '-g' option on a single genome which is split into 20 parts, where 19 parts are equally large and the last contains about 20 sequences more. After that we ran Maker using dsindex to clean up the output. We are currently using maker v2.31 on 4 threads and forks v0.34. If any further info is needed to clarify the problem, please let me know and I will provide as much as possible. Thank you for your help! Best regards, Jan Philip Oeyen ZFMK // ZMB // University of Bonn _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevintsai at iis.sinica.edu.tw Tue Aug 5 05:59:45 2014 From: kevintsai at iis.sinica.edu.tw (Kevin Tsai) Date: Tue, 5 Aug 2014 18:59:45 +0800 Subject: [maker-devel] Early obstacle with SplitDB Message-ID: Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- *Kevin Tsai* www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 15:21:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:21:51 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 15:26:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:26:51 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai Date: Tuesday, August 5, 2014 at 4:59 AM To: Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 15:49:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:49:33 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner , Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 6 02:03:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 06 Aug 2014 01:03:26 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: If it happening only with GFF3 pass-through, then it may be something I saw and fixed a while ago (there were some GFF3 passthrough fixes since 2.31.4). Could you check and see if it still happens in 2.31.6. Also if it is only the first or last CDS/exon, then Augustus can do that and it's not actually a bug. Basically it is truncating the model to the start/stop codon so the first or last exon/CDS may appear short, but it's really just incomplete. If you can find any example of a non-CDS/exon feature then could you send it to me? Thanks, Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Aug 6 02:15:04 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 07:15:04 +0000 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Message-ID: Ok. I took a look and I'm relatively sure the issue you are seeing is caused by GFF3 passthrough combined with correct_est_fusion=1. This is something that only happens when both are used simultaneously and should be corrected in the current version of MAKER. Thanks, Carson From: Marc H?ppner > Date: Wednesday, August 6, 2014 at 12:14 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt > wrote: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner >, > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM To: > Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 07:40:19 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 14:40:19 +0200 Subject: [maker-devel] Further split genome questions Message-ID: Hi Carson, I ran into more conspicuous behavior running maker 2.31 on a genome which is split into 20 parts, using the -g flag and the same basename. Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while the remaining three seemed to be stalled and produced 0B of output. Do you have any suggestion why this is happening? After I stopped these stalled jobs, I checked the index.log and found that of 38.384 mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these only appear as FINISHED (the rest only started). There are no models for these 'finished' scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., each of the 20 jobs contained scaffolds that 'did not start' but 'finished') Should this be an issue of concern? It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so we suspect something fishy going on... Hope you can help, best wishes, Jeanne Wilbrandt zmb // ZFMK // University of Bonn From carsonhh at gmail.com Wed Aug 6 09:16:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 08:16:52 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <780B8D9B-94FB-4282-9611-632C7CB532DC@gmail.com> If you are starting and restarting, or running multiple jobs then the log can be partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 result file for the contig, then it is FINISHED. FASTA files will only exist for the contigs that have gene models. Small contigs will rarely contain models. --Carson Sent from my iPhone > On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: > > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 6 09:18:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 6 Aug 2014 14:18:28 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <736D63C9-1393-4FFB-8553-262454C44BC1@genetics.utah.edu> Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent among those scaffolds? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.wilbrandt at zfmk.de Wed Aug 6 10:01:02 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:01:02 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: aha, so this explains that. Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly half of the sequences being shorter than 3,000 bp. What do you think about this weird 'I am running but not really doing anything'-behavior? Thanks a lot! Jeanne On Wed, 6 Aug 2014 14:16:52 +0000 Carson Holt wrote: >If you are starting and restarting, or running multiple jobs then the log can be >partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >result file for the contig, then it is FINISHED. FASTA files will only exist for the >contigs that have gene models. Small contigs will rarely contain models. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >> >> >> Hi Carson, >> >> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >into >> 20 parts, using the -g flag and the same basename. >> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >while >> the remaining three seemed to be stalled and produced 0B of output. Do you have any >> suggestion why this is happening? >> >> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >these >> only appear as FINISHED (the rest only started). There are no models for these >'finished' >> scaffolds stored in the .db and they are distributed over all parts of the genome >(i.e., >> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >> Should this be an issue of concern? >> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >so >> we suspect something fishy going on... >> >> Hope you can help, >> best wishes, >> Jeanne Wilbrandt >> >> zmb // ZFMK // University of Bonn >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 6 10:12:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:12:50 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <5C8B509A-7093-4626-92CE-6D09B570887C@gmail.com> I think the freezing is because you are starting too many simultaneous jobs. You should try and use MPI to parallelize instead. The concurrent job way of doing things can start to cause problems If you are running 10 or more jobs in the same directory. You could try splitting them into different directories. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: > > > aha, so this explains that. > Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly > half of the sequences being shorter than 3,000 bp. > > What do you think about this weird 'I am running but not really doing anything'-behavior? > > > Thanks a lot! > Jeanne > > > > On Wed, 6 Aug 2014 14:16:52 +0000 > Carson Holt wrote: >> If you are starting and restarting, or running multiple jobs then the log can be >> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >> result file for the contig, then it is FINISHED. FASTA files will only exist for the >> contigs that have gene models. Small contigs will rarely contain models. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> Hi Carson, >>> >>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >> into >>> 20 parts, using the -g flag and the same basename. >>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >> while >>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>> suggestion why this is happening? >>> >>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >> these >>> only appear as FINISHED (the rest only started). There are no models for these >> 'finished' >>> scaffolds stored in the .db and they are distributed over all parts of the genome >> (i.e., >>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>> Should this be an issue of concern? >>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >> so >>> we suspect something fishy going on... >>> >>> Hope you can help, >>> best wishes, >>> Jeanne Wilbrandt >>> >>> zmb // ZFMK // University of Bonn >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From j.wilbrandt at zfmk.de Wed Aug 6 10:33:07 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:33:07 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports however, that the processes seem to assemble more threads than they are allowed. It is not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? If I start the jobs in the same directory, how can I make sure they write to the same directory (as, I think is required to put the pieces together in the end?)? das -basename take paths? On Wed, 6 Aug 2014 15:12:50 +0000 Carson Holt wrote: >I think the freezing is because you are starting too many simultaneous jobs. You should >try and use MPI to parallelize instead. The concurrent job way of doing things can >start to cause problems If you are running 10 or more jobs in the same directory. You >could try splitting them into different directories. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >> >> >> aha, so this explains that. >> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >> half of the sequences being shorter than 3,000 bp. >> >> What do you think about this weird 'I am running but not really doing >anything'-behavior? >> >> >> Thanks a lot! >> Jeanne >> >> >> >> On Wed, 6 Aug 2014 14:16:52 +0000 >> Carson Holt wrote: >>> If you are starting and restarting, or running multiple jobs then the log can be >>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >GFF3 >>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>> contigs that have gene models. Small contigs will rarely contain models. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> Hi Carson, >>>> >>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>> into >>>> 20 parts, using the -g flag and the same basename. >>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>> while >>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>> suggestion why this is happening? >>>> >>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>> these >>>> only appear as FINISHED (the rest only started). There are no models for these >>> 'finished' >>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>> (i.e., >>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>> Should this be an issue of concern? >>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >good, >>> so >>>> we suspect something fishy going on... >>>> >>>> Hope you can help, >>>> best wishes, >>>> Jeanne Wilbrandt >>>> >>>> zmb // ZFMK // University of Bonn >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From carsonhh at gmail.com Wed Aug 6 10:45:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:45:56 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: <28DF9A41-8E59-4104-87A6-CD7CD9F436D8@gmail.com> Is your admin counting processes or cpu usage? Because each system call creates a separate process, so you can expect multiple processes (each system call generates a new process) but only a single cpu of usage per instance. Use different directories if you are running that many jobs. You can concatenate the separate results when your done. Use gff3_merge script to help concatenate the separate GFF3 files generated from separate jobs. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: > > > > We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports > however, that the processes seem to assemble more threads than they are allowed. It is > not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? > > If I start the jobs in the same directory, how can I make sure they write to the same > directory (as, I think is required to put the pieces together in the end?)? das -basename > take paths? > > > On Wed, 6 Aug 2014 15:12:50 +0000 > Carson Holt wrote: >> I think the freezing is because you are starting too many simultaneous jobs. You should >> try and use MPI to parallelize instead. The concurrent job way of doing things can >> start to cause problems If you are running 10 or more jobs in the same directory. You >> could try splitting them into different directories. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> aha, so this explains that. >>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >>> half of the sequences being shorter than 3,000 bp. >>> >>> What do you think about this weird 'I am running but not really doing >> anything'-behavior? >>> >>> >>> Thanks a lot! >>> Jeanne >>> >>> >>> >>> On Wed, 6 Aug 2014 14:16:52 +0000 >>> Carson Holt wrote: >>>> If you are starting and restarting, or running multiple jobs then the log can be >>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >> GFF3 >>>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>>> contigs that have gene models. Small contigs will rarely contain models. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>> >>>>> >>>>> Hi Carson, >>>>> >>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>> into >>>>> 20 parts, using the -g flag and the same basename. >>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>>> while >>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>>> suggestion why this is happening? >>>>> >>>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>> these >>>>> only appear as FINISHED (the rest only started). There are no models for these >>>> 'finished' >>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>> (i.e., >>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>> Should this be an issue of concern? >>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >> good, >>>> so >>>>> we suspect something fishy going on... >>>>> >>>>> Hope you can help, >>>>> best wishes, >>>>> Jeanne Wilbrandt >>>>> >>>>> zmb // ZFMK // University of Bonn >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carson.holt at genetics.utah.edu Wed Aug 6 12:18:22 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 17:18:22 +0000 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: It's better to run fewer jobs with more cpus given to MPI rather than many jobs with few cpus (i.e. mpiexec -n 4). To correct errors, you just restart MAKER. No need to set the -a flag unless you want to rerun everything, and not just the failed contigs. --Carson On 8/6/14, 3:03 AM, "Jeanne Wilbrandt" wrote: > >Hi! > >Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our >cluster can >take. Do you think this is too much? > >Please find attached the output file (containing the STDERR) of the >dsindex-run, and one >example output of one of the pieces. > >Another quick question to make sure I understood the guides correctly: If >a job did not >finish properly, it should suffice to restart the same thing just with >the -a flag and it >should clean up and finish what it was supposed to, right? (i.e., it's >not necessary to >trace and delete the unfinished output manually?) > >Thank you again! >Jeanne Wilbrandt > >zmb // ZFMK // University of Bonn > > > >On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: >> >> >> 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 04 Aug 2014 14:27:08 -0600 >> From: Carson Holt >> To: Jan Philip Oeyen , >> >> Subject: Re: [maker-devel] Forks.pm error when running maker with >> dsindex >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Sorry for the slow reply. I was on vacation all last week. Do you >>have the >> full STDERR? sometimes the last error is irrelevant and it's just the >>result >> of a failure further upstream. Also are you running 20 independent maker >> jobs simultaneously? >> >> --Carson >> >> >> From: Jan Philip Oeyen >> Date: Monday, July 28, 2014 at 6:22 AM >> To: >> Subject: [maker-devel] Forks.pm error when running maker with dsindex >> >> Hi all, >> we are currently having some unexpected errors when running maker on a >> genome which is split in several parts. Our cluster admin reported the >> following error message: >> >> Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu >> les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm >> line 2188. >> SIGTERM received >> SIGTERM received >> SIGTERM received >> >> We were using maker with the '-g' option on a single genome which is >>split >> into 20 parts, where 19 parts are equally large and the last contains >>about >> 20 sequences more. After that we ran Maker using dsindex to clean up the >> output. We are currently using maker v2.31 on 4 threads and forks v0.34. >> >> If any further info is needed to clarify the problem, please let me >>know and >> I will provide as much as possible. >> >> Thank you for your help! >> >> Best regards, >> Jan Philip Oeyen >> ZFMK // ZMB // University of Bonn >> From mphoeppner at gmail.com Wed Aug 6 01:14:23 2014 From: mphoeppner at gmail.com (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Wed, 6 Aug 2014 08:14:23 +0200 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt wrote: > One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? > > --Carson > > > From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM > To: Marc H?ppner , > Subject: Re: [maker-devel] Maker GFF output with features of 0 length > > Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? > > --Carson > > > From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM > To: > Subject: [maker-devel] Maker GFF output with features of 0 length > > Hi, > > I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. > > For example: > > scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 > > > This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... > > I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. > > Regards, > > Marc > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 04:03:28 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 11:03:28 +0200 Subject: [maker-devel] Forks.pm error when running maker with dsindex Message-ID: Hi! Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our cluster can take. Do you think this is too much? Please find attached the output file (containing the STDERR) of the dsindex-run, and one example output of one of the pieces. Another quick question to make sure I understood the guides correctly: If a job did not finish properly, it should suffice to restart the same thing just with the -a flag and it should clean up and finish what it was supposed to, right? (i.e., it's not necessary to trace and delete the unfinished output manually?) Thank you again! Jeanne Wilbrandt zmb // ZFMK // University of Bonn On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: > > > 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 04 Aug 2014 14:27:08 -0600 > From: Carson Holt > To: Jan Philip Oeyen , > > Subject: Re: [maker-devel] Forks.pm error when running maker with > dsindex > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Sorry for the slow reply. I was on vacation all last week. Do you have the > full STDERR? sometimes the last error is irrelevant and it's just the result > of a failure further upstream. Also are you running 20 independent maker > jobs simultaneously? > > --Carson > > > From: Jan Philip Oeyen > Date: Monday, July 28, 2014 at 6:22 AM > To: > Subject: [maker-devel] Forks.pm error when running maker with dsindex > > Hi all, > we are currently having some unexpected errors when running maker on a > genome which is split in several parts. Our cluster admin reported the > following error message: > > Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu > les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm > line 2188. > SIGTERM received > SIGTERM received > SIGTERM received > > We were using maker with the '-g' option on a single genome which is split > into 20 parts, where 19 parts are equally large and the last contains about > 20 sequences more. After that we ran Maker using dsindex to clean up the > output. We are currently using maker v2.31 on 4 threads and forks v0.34. > > If any further info is needed to clarify the problem, please let me know and > I will provide as much as possible. > > Thank you for your help! > > Best regards, > Jan Philip Oeyen > ZFMK // ZMB // University of Bonn > -------------- next part -------------- A non-text attachment was scrubbed... Name: split_index.o2510 Type: application/octet-stream Size: 1641 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_04.o2490 Type: application/octet-stream Size: 8883704 bytes Desc: not available URL: From dandence at gmail.com Wed Aug 6 08:50:43 2014 From: dandence at gmail.com (Daniel Ence) Date: Wed, 6 Aug 2014 07:50:43 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Aug 11 11:11:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 11 Aug 2014 10:11:28 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson From: Kevin Tsai Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: > Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted > directory (must be locally mounted), the drive containing directory specified > by TMP= (defaults to /tmp) is full or nearly full, your input file is not > proper fasta format, or you are using an out of date version of BioPerl. > > Try the first three in the list then look at BioPerl. The BioPerl version > should be printed as part of the the debug output. > > --Carson > > > From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM > To: > Subject: [maker-devel] Early obstacle with SplitDB > > Hello, > I'm a new user to Maker so I suspect this will be a simple question, but I am > having trouble finding documentation on SplitDB. Our IT admin set up the > application and I'm running into the following issue about 30 seconds after > kickoff. Below is the debugged output: > > STATUS: Parsing control files... > Calling GI::load_control_files at /usr/bin/maker line 452. > Calling GI::new_instance_temp at /usr/bin/maker line 463. > Calling GI::mount_check at /usr/bin/maker line 465. > Calling GI::set_global_temp at /usr/bin/maker line 483. > STATUS: Processing and indexing input FASTA files... > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling List::Util::shuffle at /usr/bin/maker line 529. > Calling GI::split_db at /usr/bin/maker line 536. > Calling File::Path::rmtree at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling Iterator::Any::nextDef at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling mkdir at /usr/bin/maker line 537. > Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. > Calling system at /usr/bin/maker line 537. > ERROR: SplitDB not created correctly > > at /usr/local/share/perl5/GI.pm line 1144. > GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, > "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at > /usr/bin/maker line 537 > --> rank=NA, hostname=Za2.cglab > > Any suggestions? Thank you in advance! > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Aug 13 04:30:39 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 13 Aug 2014 15:00:39 +0530 Subject: [maker-devel] does MAKER modify input FASTA Message-ID: Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam From j.wilbrandt at zfmk.de Wed Aug 13 04:32:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 13 Aug 2014 11:32:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From dence at genetics.utah.edu Wed Aug 13 10:29:41 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:29:41 +0000 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: Hi Priyam, After MAKER has completed it's run and you've merged the results with gff3_merge, you can see the original fasta genome in the resulting gff3 file, below the ##FASTA pragma. For each scaffold in your genome, the masked fasta can be found in it's individual directory in the master_datastore that MAKER created to keep track of results. I'm pretty sure this will only be 'soft-masked' (lower-case letters) and not hard-masked ('N' characters). Let me know whether this helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Anurag Priyam [a.priyam at qmul.ac.uk] Sent: Wednesday, August 13, 2014 3:30 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] does MAKER modify input FASTA Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 10:46:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:46:27 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: The output fasta will be letter for letter identical to the input fasta and will be all uppercase. Only if your input fasta contains unrecognized characters (for example 'Y' in the middle of the nucleotide sequence) and you use the --fix_nucleotides flag will those unrecognized characters be changed to 'N'. The masked fasta can be pulled out of theVoid directory if you really need it. It will be called query_masked.fasta. --Carson On 8/13/14, 3:30 AM, "Anurag Priyam" wrote: >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 13 10:46:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:46:59 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de>, Message-ID: Hi Jeanne, I believe that's right. You can pass gff3_merge either a list of gff3 files or a maker-created datastore index file. To compile the pieces for each of your different runs you would give gff3_merge the datastore index file. To put those resulting gff3 files together, you would pass gff3_merge the list of gff3 files that you want to merge. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jeanne Wilbrandt [j.wilbrandt at zfmk.de] Sent: Wednesday, August 13, 2014 3:32 AM To: Carson Holt; Wilbrandt Jeanne Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Further split genome questions Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 10:47:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:47:15 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: It will actually be a mixture of hard and soft masking depending on the class of repeat. --Carson On 8/13/14, 9:29 AM, "Daniel Ence" wrote: >Hi Priyam, > >After MAKER has completed it's run and you've merged the results with >gff3_merge, you can see the original fasta genome in the resulting gff3 >file, below the ##FASTA pragma. > >For each scaffold in your genome, the masked fasta can be found in it's >individual directory in the master_datastore that MAKER created to keep >track of results. I'm pretty sure this will only be 'soft-masked' >(lower-case letters) and not hard-masked ('N' characters). > >Let me know whether this helps, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Anurag Priyam [a.priyam at qmul.ac.uk] >Sent: Wednesday, August 13, 2014 3:30 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] does MAKER modify input FASTA > >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 10:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:52:34 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Yes. One cpu will have several processes, most are helper processes that will use 0% CPU almost all of the time (for example there is a shared variable manager process that will launch with MAKER but will also be called 'maker' under top because it is technically its child and not a separate script). Also system calls will launch a new process that will use all CPU while the process calling it will drop to 0% CPU until it finishes. Yes. Your explanation is correct. You then use gff3_merge to merge the GFF3 file. --Carson On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >Our admin counts processes. Do I understand you right, that one CPU >handles several >processes? > >I'm still confused by the different directories (and I made a mistake >when asking last >time, I wanted to say 'If I do NOT start the jobs in the same >directory...). >So, if I start each piece of a genome in its own directory (for example), >then it gets a >unique basename (because the output will be separate from all other >pieces anyway) and I >will not run dsindex but instead use gff3_merge for each piece's output >and then once >again to merge all resulting gff3-files? > >Hope I got you right :) > >Thanks fopr your help! >Jeanne > > > >On Wed, 6 Aug 2014 15:45:56 +0000 > Carson Holt wrote: >>Is your admin counting processes or cpu usage? Because each system call >>creates a >>separate process, so you can expect multiple processes (each system call >>generates a new >>process) but only a single cpu of usage per instance. Use different >>directories if you >>are running that many jobs. You can concatenate the separate results >>when your done. >> Use gff3_merge script to help concatenate the separate GFF3 files >>generated from >>separate jobs. >> >>--Carson >> >>Sent from my iPhone >> >>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>wrote: >>> >>> >>> >>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>threads. Our admin >>reports >>> however, that the processes seem to assemble more threads than they >>>are allowed. It is >>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>suggestion why? >>> >>> If I start the jobs in the same directory, how can I make sure they >>>write to the same >>> directory (as, I think is required to put the pieces together in the >>>end?)? das >>-basename >>> take paths? >>> >>> >>> On Wed, 6 Aug 2014 15:12:50 +0000 >>> Carson Holt wrote: >>>> I think the freezing is because you are starting too many >>>>simultaneous jobs. You >>should >>>> try and use MPI to parallelize instead. The concurrent job way of >>>>doing things can >>>> start to cause problems If you are running 10 or more jobs in the >>>>same directory. You >>>> could try splitting them into different directories. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> aha, so this explains that. >>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>than 60,000, >>roughly >>>>> half of the sequences being shorter than 3,000 bp. >>>>> >>>>> What do you think about this weird 'I am running but not really doing >>>> anything'-behavior? >>>>> >>>>> >>>>> Thanks a lot! >>>>> Jeanne >>>>> >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>> Carson Holt wrote: >>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>the log can be >>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>> If there is a >>>> GFF3 >>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>only exist for >>the >>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>models. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Carson, >>>>>>> >>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>genome which is split >>>>>> into >>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>finish >>normally, >>>>>> while >>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>output. Do you have >>any >>>>>>> suggestion why this is happening? >>>>>>> >>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>found that of >>38.384 >>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>is, that 2/3 of >>>>>> these >>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>models for these >>>>>> 'finished' >>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>parts of the genome >>>>>> (i.e., >>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>'finished') >>>>>>> Should this be an issue of concern? >>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>NFS files look >>>> good, >>>>>> so >>>>>>> we suspect something fishy going on... >>>>>>> >>>>>>> Hope you can help, >>>>>>> best wishes, >>>>>>> Jeanne Wilbrandt >>>>>>> >>>>>>> zmb // ZFMK // University of Bonn >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>org >>> > From cjfields at illinois.edu Wed Aug 13 12:14:56 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 13 Aug 2014 17:14:56 +0000 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: On Aug 11, 2014, at 11:11 AM, Carson Holt > wrote: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson chris From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt > wrote: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM To: > Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 13 13:19:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 12:19:50 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: The Berkley_DB/NFS issues happen more often for large index files or NFS systems with a slow response. Such issues also happen almost exclusively during index creation. There is a way you can tell MAKER to have BioPerl use something other than Berkley DB for indexing if you suspect that's the issue. You can give it a flag during the initial MAKER setup and installation. #use GDBM library cd .../maker/src perl Build.PL --AnyDBM_ISA GDBM_File ./Build install #use SDBM files cd .../maker/src perl Build.PL --AnyDBM_ISA SDBM_File ./Build install #use Berkley DB (default) cd .../maker/src perl Build.PL --AnyDBM_ISA DB_File ./Build install However, I find that the alternatives to Berkley DB can be more flakey. Also make sure /tmp is not tmpfs (which it may be on some systems). I've also seen weird behavior trying to index files on tmpfs storage on some systems. Thanks, Carson From: "Fields, Christopher J" Date: Wednesday, August 13, 2014 at 11:14 AM To: Carson Holt Cc: Kevin Tsai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Early obstacle with SplitDB On Aug 11, 2014, at 11:11 AM, Carson Holt wrote: > If you are updating every month to BioPerl live, don't. You should use the > CPAN version of BioPerl or even the stable download. BioPerl live has > actually broken several components MAKER uses at different times and depending > on which version you currently have, may be broken now. Could you send me the > Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. > Also could you send me this file --> /home/keceltes/maker2/final.fasta > > The point of failure is actually very simple. At that point in the code, > MAKER opens a file, reads it in one line at a time, writes it out to a new > file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives > because it uses Berkley DB). For that reason whenever it fails at that point, > it is either a drive space issue, NFS issue, BioPerl issue, or file format > issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. > Also are you running via MPI? I ask because if you are using multiple nodes > you will have to check the sixe of /tmp independently on each node (since the > values will be different). > > Thanks, > Carson chris > From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM > To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB > > Hi Carson, > Thanks for the suggestions. > > I left the TMP= empty, which as you mentioned defaults to /tmp. There seems > to be a different error when using an NFS mounted directory (as I manually > verified). My /tmp is also not full or nearly full, I have verified proper > fasta formatting as I have run the fasta file through other statistics > generating tools (i.e. Quast). We are also update BioPerl monthly. > > Do you think it could be anything else? Do you think any more information > that I might be able to provide will be more insightful? > > > On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: >> Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted >> directory (must be locally mounted), the drive containing directory specified >> by TMP= (defaults to /tmp) is full or nearly full, your input file is not >> proper fasta format, or you are using an out of date version of BioPerl. >> >> Try the first three in the list then look at BioPerl. The BioPerl version >> should be printed as part of the the debug output. >> >> --Carson >> >> >> From: Kevin Tsai >> Date: Tuesday, August 5, 2014 at 4:59 AM >> To: >> Subject: [maker-devel] Early obstacle with SplitDB >> >> Hello, >> I'm a new user to Maker so I suspect this will be a simple question, but I am >> having trouble finding documentation on SplitDB. Our IT admin set up the >> application and I'm running into the following issue about 30 seconds after >> kickoff. Below is the debugged output: >> >> STATUS: Parsing control files... >> Calling GI::load_control_files at /usr/bin/maker line 452. >> Calling GI::new_instance_temp at /usr/bin/maker line 463. >> Calling GI::mount_check at /usr/bin/maker line 465. >> Calling GI::set_global_temp at /usr/bin/maker line 483. >> STATUS: Processing and indexing input FASTA files... >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling List::Util::shuffle at /usr/bin/maker line 529. >> Calling GI::split_db at /usr/bin/maker line 536. >> Calling File::Path::rmtree at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling Iterator::Any::nextDef at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling mkdir at /usr/bin/maker line 537. >> Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. >> Calling system at /usr/bin/maker line 537. >> ERROR: SplitDB not created correctly >> >> at /usr/local/share/perl5/GI.pm line 1144. >> GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, >> "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at >> /usr/bin/maker line 537 >> --> rank=NA, hostname=Za2.cglab >> >> Any suggestions? Thank you in advance! >> -- >> Kevin Tsai >> www.linkedin.com/in/kevinjtsai/ >> Ph.D. Candidate, Bioinformatics >> Institute of Information Science, Academia Sinica >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Thu Aug 14 10:40:04 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:40:04 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Thank you so much! However, I'm still, struggling, I'm afraid: I tried this 'two-step merging' approach with a subset of scaffolds and got duplicate IDs. Here is what I did: - divided input scaffolds in two files - run maker separately on these files (-> separate output dirs) -- additional input: maker-generated gff3 from previous (singular) run -- repeatmasking, snaphmm, gmhmm, augustus_species are given -- map_forward=0 / 1 (I tried both, to the same effect) - gff3_merge two times using index-log - gff3_merge these two gff3 files $ grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | uniq -c | sort -n | tail 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 $ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | grep "\sgene" scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 - found duplicates! i.e. the same ID for gene annotations in different areas of the same scaffold (of 655 gene annotations, 51 appear twice) -- this happens not only with gene, but also CDS and mRNA annotations, as far as I can see (here, in one example, non-everlapping but close CDS snippets got the same ID). I suspected this might have to do with the map_forward flag, but I get the same problem again (with genes at the same locations). I attached one of the ctl files for you in case you want to have a look, the other is analogous. Do you need something else? What did I miss? This should not happen, right? On Wed, 13 Aug 2014 15:52:34 +0000 Carson Holt wrote: >Yes. One cpu will have several processes, most are helper processes that >will use 0% CPU almost all of the time (for example there is a shared >variable manager process that will launch with MAKER but will also be >called 'maker' under top because it is technically its child and not a >separate script). Also system calls will launch a new process that will >use all CPU while the process calling it will drop to 0% CPU until it >finishes. > >Yes. Your explanation is correct. You then use gff3_merge to merge the >GFF3 file. > >--Carson > > > >On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >> >>Our admin counts processes. Do I understand you right, that one CPU >>handles several >>processes? >> >>I'm still confused by the different directories (and I made a mistake >>when asking last >>time, I wanted to say 'If I do NOT start the jobs in the same >>directory...). >>So, if I start each piece of a genome in its own directory (for example), >>then it gets a >>unique basename (because the output will be separate from all other >>pieces anyway) and I >>will not run dsindex but instead use gff3_merge for each piece's output >>and then once >>again to merge all resulting gff3-files? >> >>Hope I got you right :) >> >>Thanks fopr your help! >>Jeanne >> >> >> >>On Wed, 6 Aug 2014 15:45:56 +0000 >> Carson Holt wrote: >>>Is your admin counting processes or cpu usage? Because each system call >>>creates a >>>separate process, so you can expect multiple processes (each system call >>>generates a new >>>process) but only a single cpu of usage per instance. Use different >>>directories if you >>>are running that many jobs. You can concatenate the separate results >>>when your done. >>> Use gff3_merge script to help concatenate the separate GFF3 files >>>generated from >>>separate jobs. >>> >>>--Carson >>> >>>Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>wrote: >>>> >>>> >>>> >>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>threads. Our admin >>>reports >>>> however, that the processes seem to assemble more threads than they >>>>are allowed. It is >>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>suggestion why? >>>> >>>> If I start the jobs in the same directory, how can I make sure they >>>>write to the same >>>> directory (as, I think is required to put the pieces together in the >>>>end?)? das >>>-basename >>>> take paths? >>>> >>>> >>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>> Carson Holt wrote: >>>>> I think the freezing is because you are starting too many >>>>>simultaneous jobs. You >>>should >>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>doing things can >>>>> start to cause problems If you are running 10 or more jobs in the >>>>>same directory. You >>>>> could try splitting them into different directories. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> aha, so this explains that. >>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>than 60,000, >>>roughly >>>>>> half of the sequences being shorter than 3,000 bp. >>>>>> >>>>>> What do you think about this weird 'I am running but not really doing >>>>> anything'-behavior? >>>>>> >>>>>> >>>>>> Thanks a lot! >>>>>> Jeanne >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>> Carson Holt wrote: >>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>the log can be >>>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>>> If there is a >>>>> GFF3 >>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>only exist for >>>the >>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>models. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Carson, >>>>>>>> >>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>genome which is split >>>>>>> into >>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>>finish >>>normally, >>>>>>> while >>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>output. Do you have >>>any >>>>>>>> suggestion why this is happening? >>>>>>>> >>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>found that of >>>38.384 >>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>>is, that 2/3 of >>>>>>> these >>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>models for these >>>>>>> 'finished' >>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>parts of the genome >>>>>>> (i.e., >>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>'finished') >>>>>>>> Should this be an issue of concern? >>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>NFS files look >>>>> good, >>>>>>> so >>>>>>>> we suspect something fishy going on... >>>>>>>> >>>>>>>> Hope you can help, >>>>>>>> best wishes, >>>>>>>> Jeanne Wilbrandt >>>>>>>> >>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>>org >>>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts_Lclav_splitrun_problem_01_mapfwd.ctl Type: application/octet-stream Size: 5859 bytes Desc: not available URL: From carsonhh at gmail.com Thu Aug 14 10:46:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:46:44 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: What version of MAKER are you using? I'd also need to see the GFF3 files before the merge. You may also need to turn off map_forward since you are passing in GFF3 with MAKER names, creating new models with MAKER names and then moving names from old models forward onto new ones (which may force names to be used twice). --Carson On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >Thank you so much! > >However, I'm still, struggling, I'm afraid: I tried this 'two-step >merging' approach with >a subset of scaffolds and got duplicate IDs. > >Here is what I did: >- divided input scaffolds in two files >- run maker separately on these files (-> separate output dirs) >-- additional input: maker-generated gff3 from previous (singular) run >-- repeatmasking, snaphmm, gmhmm, augustus_species are given >-- map_forward=0 / 1 (I tried both, to the same effect) >- gff3_merge two times using index-log >- gff3_merge these two gff3 files > >$ >grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >uniq -c | sort -n >| tail > 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 > 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 > 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 > 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 > 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 > >$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >grep "\sgene" >scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 > >- found duplicates! i.e. the same ID for gene annotations in different >areas of the same >scaffold (of 655 gene annotations, 51 appear twice) >-- this happens not only with gene, but also CDS and mRNA annotations, as >far as I can >see (here, in one example, non-everlapping but close CDS snippets got the >same ID). > > >I suspected this might have to do with the map_forward flag, but I get >the same problem >again (with genes at the same locations). >I attached one of the ctl files for you in case you want to have a look, >the other is >analogous. Do you need something else? > >What did I miss? This should not happen, right? > > > > >On Wed, 13 Aug 2014 15:52:34 +0000 > Carson Holt wrote: >>Yes. One cpu will have several processes, most are helper processes that >>will use 0% CPU almost all of the time (for example there is a shared >>variable manager process that will launch with MAKER but will also be >>called 'maker' under top because it is technically its child and not a >>separate script). Also system calls will launch a new process that will >>use all CPU while the process calling it will drop to 0% CPU until it >>finishes. >> >>Yes. Your explanation is correct. You then use gff3_merge to merge the >>GFF3 file. >> >>--Carson >> >> >> >>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Our admin counts processes. Do I understand you right, that one CPU >>>handles several >>>processes? >>> >>>I'm still confused by the different directories (and I made a mistake >>>when asking last >>>time, I wanted to say 'If I do NOT start the jobs in the same >>>directory...). >>>So, if I start each piece of a genome in its own directory (for >>>example), >>>then it gets a >>>unique basename (because the output will be separate from all other >>>pieces anyway) and I >>>will not run dsindex but instead use gff3_merge for each piece's output >>>and then once >>>again to merge all resulting gff3-files? >>> >>>Hope I got you right :) >>> >>>Thanks fopr your help! >>>Jeanne >>> >>> >>> >>>On Wed, 6 Aug 2014 15:45:56 +0000 >>> Carson Holt wrote: >>>>Is your admin counting processes or cpu usage? Because each system >>>>call >>>>creates a >>>>separate process, so you can expect multiple processes (each system >>>>call >>>>generates a new >>>>process) but only a single cpu of usage per instance. Use different >>>>directories if you >>>>are running that many jobs. You can concatenate the separate results >>>>when your done. >>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>generated from >>>>separate jobs. >>>> >>>>--Carson >>>> >>>>Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> >>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>threads. Our admin >>>>reports >>>>> however, that the processes seem to assemble more threads than they >>>>>are allowed. It is >>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>suggestion why? >>>>> >>>>> If I start the jobs in the same directory, how can I make sure they >>>>>write to the same >>>>> directory (as, I think is required to put the pieces together in the >>>>>end?)? das >>>>-basename >>>>> take paths? >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>> Carson Holt wrote: >>>>>> I think the freezing is because you are starting too many >>>>>>simultaneous jobs. You >>>>should >>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>doing things can >>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>same directory. You >>>>>> could try splitting them into different directories. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> aha, so this explains that. >>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>than 60,000, >>>>roughly >>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>> >>>>>>> What do you think about this weird 'I am running but not really >>>>>>>doing >>>>>> anything'-behavior? >>>>>>> >>>>>>> >>>>>>> Thanks a lot! >>>>>>> Jeanne >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>the log can be >>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>added. >>>>>>>> If there is a >>>>>> GFF3 >>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>only exist for >>>>the >>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>models. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Carson, >>>>>>>>> >>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>genome which is split >>>>>>>> into >>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>to >>>>>>>>>finish >>>>normally, >>>>>>>> while >>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>output. Do you have >>>>any >>>>>>>>> suggestion why this is happening? >>>>>>>>> >>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>found that of >>>>38.384 >>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>surprise >>>>>>>>>is, that 2/3 of >>>>>>>> these >>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>models for these >>>>>>>> 'finished' >>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>parts of the genome >>>>>>>> (i.e., >>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>'finished') >>>>>>>>> Should this be an issue of concern? >>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>NFS files look >>>>>> good, >>>>>>>> so >>>>>>>>> we suspect something fishy going on... >>>>>>>>> >>>>>>>>> Hope you can help, >>>>>>>>> best wishes, >>>>>>>>> Jeanne Wilbrandt >>>>>>>>> >>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> maker-devel mailing list >>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>> >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b. >>>>>>>>>org >>>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 10:55:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:55:15 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Which 2.31? Current is 2.31.6. --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 10:57:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:57:39 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: For the file you just sent me, is that from the first run with map_forward=0 or with map_forward=1? --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From j.wilbrandt at zfmk.de Thu Aug 14 10:53:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:53:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: It is version 2.31. My first try was done with map_forward=0, and (I just noticed) the duplicates are present in the separate gff3s already also in this case (one is attached). Has this something to do with the first-run-gff3 I fed it? On Thu, 14 Aug 2014 15:46:44 +0000 Carson Holt wrote: >What version of MAKER are you using? I'd also need to see the GFF3 files >before the merge. You may also need to turn off map_forward since you are >passing in GFF3 with MAKER names, creating new models with MAKER names and >then moving names from old models forward onto new ones (which may force >names to be used twice). > >--Carson > > >On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >> >>Thank you so much! >> >>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>merging' approach with >>a subset of scaffolds and got duplicate IDs. >> >>Here is what I did: >>- divided input scaffolds in two files >>- run maker separately on these files (-> separate output dirs) >>-- additional input: maker-generated gff3 from previous (singular) run >>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>-- map_forward=0 / 1 (I tried both, to the same effect) >>- gff3_merge two times using index-log >>- gff3_merge these two gff3 files >> >>$ >>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>uniq -c | sort -n >>| tail >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >> >>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>grep "\sgene" >>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >> >>- found duplicates! i.e. the same ID for gene annotations in different >>areas of the same >>scaffold (of 655 gene annotations, 51 appear twice) >>-- this happens not only with gene, but also CDS and mRNA annotations, as >>far as I can >>see (here, in one example, non-everlapping but close CDS snippets got the >>same ID). >> >> >>I suspected this might have to do with the map_forward flag, but I get >>the same problem >>again (with genes at the same locations). >>I attached one of the ctl files for you in case you want to have a look, >>the other is >>analogous. Do you need something else? >> >>What did I miss? This should not happen, right? >> >> >> >> >>On Wed, 13 Aug 2014 15:52:34 +0000 >> Carson Holt wrote: >>>Yes. One cpu will have several processes, most are helper processes that >>>will use 0% CPU almost all of the time (for example there is a shared >>>variable manager process that will launch with MAKER but will also be >>>called 'maker' under top because it is technically its child and not a >>>separate script). Also system calls will launch a new process that will >>>use all CPU while the process calling it will drop to 0% CPU until it >>>finishes. >>> >>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>GFF3 file. >>> >>>--Carson >>> >>> >>> >>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>> >>>> >>>>Our admin counts processes. Do I understand you right, that one CPU >>>>handles several >>>>processes? >>>> >>>>I'm still confused by the different directories (and I made a mistake >>>>when asking last >>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>directory...). >>>>So, if I start each piece of a genome in its own directory (for >>>>example), >>>>then it gets a >>>>unique basename (because the output will be separate from all other >>>>pieces anyway) and I >>>>will not run dsindex but instead use gff3_merge for each piece's output >>>>and then once >>>>again to merge all resulting gff3-files? >>>> >>>>Hope I got you right :) >>>> >>>>Thanks fopr your help! >>>>Jeanne >>>> >>>> >>>> >>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>> Carson Holt wrote: >>>>>Is your admin counting processes or cpu usage? Because each system >>>>>call >>>>>creates a >>>>>separate process, so you can expect multiple processes (each system >>>>>call >>>>>generates a new >>>>>process) but only a single cpu of usage per instance. Use different >>>>>directories if you >>>>>are running that many jobs. You can concatenate the separate results >>>>>when your done. >>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>generated from >>>>>separate jobs. >>>>> >>>>>--Carson >>>>> >>>>>Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>threads. Our admin >>>>>reports >>>>>> however, that the processes seem to assemble more threads than they >>>>>>are allowed. It is >>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>suggestion why? >>>>>> >>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>write to the same >>>>>> directory (as, I think is required to put the pieces together in the >>>>>>end?)? das >>>>>-basename >>>>>> take paths? >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>> Carson Holt wrote: >>>>>>> I think the freezing is because you are starting too many >>>>>>>simultaneous jobs. You >>>>>should >>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>doing things can >>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>same directory. You >>>>>>> could try splitting them into different directories. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>> >>>>>>>>wrote: >>>>>>>> >>>>>>>> >>>>>>>> aha, so this explains that. >>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>than 60,000, >>>>>roughly >>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>> >>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>doing >>>>>>> anything'-behavior? >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> Jeanne >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>> Carson Holt wrote: >>>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>>the log can be >>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>added. >>>>>>>>> If there is a >>>>>>> GFF3 >>>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>>only exist for >>>>>the >>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>models. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Carson, >>>>>>>>>> >>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>genome which is split >>>>>>>>> into >>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>to >>>>>>>>>>finish >>>>>normally, >>>>>>>>> while >>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>output. Do you have >>>>>any >>>>>>>>>> suggestion why this is happening? >>>>>>>>>> >>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>found that of >>>>>38.384 >>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>surprise >>>>>>>>>>is, that 2/3 of >>>>>>>>> these >>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>models for these >>>>>>>>> 'finished' >>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>parts of the genome >>>>>>>>> (i.e., >>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>>'finished') >>>>>>>>>> Should this be an issue of concern? >>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>>NFS files look >>>>>>> good, >>>>>>>>> so >>>>>>>>>> we suspect something fishy going on... >>>>>>>>>> >>>>>>>>>> Hope you can help, >>>>>>>>>> best wishes, >>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>> >>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>>b. >>>>>>>>>>org >>>>>> >>>> >>> >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: splitrun_problem_01_all.gff3 Type: application/octet-stream Size: 4967463 bytes Desc: not available URL: From daniel.standage at gmail.com Thu Aug 21 10:33:33 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:33:33 -0400 Subject: [maker-devel] tRNAscan GFF3 Message-ID: Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 10:35:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:35:16 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: It should be on the same strand. Which MAKER version are you using? --Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:33 AM To: Maker Mailing List Subject: [maker-devel] tRNAscan GFF3 Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu Aug 21 10:36:41 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:36:41 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it > looks like Maker reports the second exon on the opposite strand as the > first exon, the tRNA feature, and the gene feature? Am I reading this > correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, > I would suggest that the correct representation would be to have all > features on the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 10:49:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:49:36 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: I half way remember some tRNAscan bugs being fixed in several of the sub versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. Could you give it a try and see if it is still giving you the issue. I did a quick look through the archives and I think this was found and fixed --> https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-d evel/Z-kvf_V2ynU/vstSNjHgyJQJ Thanks, Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:36 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it looks > like Maker reports the second exon on the opposite strand as the first exon, > the tRNA feature, and the gene feature? Am I reading this correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, I > would suggest that the correct representation would be to have all features on > the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Tue Aug 19 04:19:08 2014 From: rens.holmer at wur.nl (rens holmer) Date: Tue, 19 Aug 2014 11:19:08 +0200 Subject: [maker-devel] Maker error mpiexec Message-ID: Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Timothy.Stitt at tgac.ac.uk Thu Aug 21 15:05:46 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 21 Aug 2014 20:05:46 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 15:17:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:17:22 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 15:21:19 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:21:19 -0600 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: You need to make sure the same version of MPI is used to compile and run MAKER. When installing MAKER make sure the mpi.h and mpicc indicated during configuration come from the same version of OpenMPI as the mpiexec command you are using now. Also for OpenMPI run the following command before setting up or launching MAKER --> export LD_PRELOAD=?/openmpi_location/lib/libmpi.so replace openmpi_location in the above command with the location of your OpenMPI. Setting LD_PRELOAD preload is required for OpenMPI to work correctly with shared libraries. Also you may need to add the following to your MPI command before running MAKER. --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 40 maker Thanks, Carson From: rens holmer Date: Tuesday, August 19, 2014 at 3:19 AM To: Subject: [maker-devel] Maker error mpiexec Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 15:27:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:27:14 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: FYI. If you use the -nolock flag, never start MAKER more than once in the same directory. The lack of file locks means MAKER won't detect the other active process and they can end up overwriting each others output. So do any parallelization via MPI instead. Thanks, Carson From: Carson Holt Date: Thursday, August 21, 2014 at 2:17 PM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Fri Aug 22 05:43:20 2014 From: rens.holmer at wur.nl (rens holmer) Date: Fri, 22 Aug 2014 12:43:20 +0200 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: Thank you! export LD_PRELOAD=?/openmpi_location/lib/libmpi.so mpiexec -mca btl ^openib -n 40 maker Those two tweaks did the trick! Sincerely, Rens Holmer On Thu, Aug 21, 2014 at 10:21 PM, Carson Holt wrote: > You need to make sure the same version of MPI is used to compile and run > MAKER. When installing MAKER make sure the mpi.h and mpicc indicated > during configuration come from the same version of OpenMPI as the mpiexec > command you are using now. > > Also for OpenMPI run the following command before setting up or launching > MAKER --> > export LD_PRELOAD=?/openmpi_location/lib/libmpi.so > > replace openmpi_location in the above command with the location of your > OpenMPI. > > Setting LD_PRELOAD preload is required for OpenMPI to work correctly with > shared libraries. > > > Also you may need to add the following to your MPI command before running > MAKER. > --> -mca btl ^openib > Example --> mpiexec -mca btl ^openib -n 40 maker > > Thanks, > Carson > > > > From: rens holmer > Date: Tuesday, August 19, 2014 at 3:19 AM > To: > Subject: [maker-devel] Maker error mpiexec > > Hi, > > I am trying to run maker using MPI, and I get an error I do not understand. > > Maker version: 2.13.6 > mpiexec version: mpiexec (OpenRTE) 1.6.5 > > When I run ./Build status it is reported that MPI is enabled. > > When I run mpiexec -n 40 maker I get the following errors: > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > opal_shmem_base_select failed > > --> Returned value -1 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > > > Etcetera etcetera. > > However: when I search for the files reported as missing I do find them, > and I don't believe they are from a different version of MPI? > > Am I using a wrong version of MPI? > > Any help would be appreciated, > > Sincerely, > > > Rens Holmer > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Aug 26 09:53:25 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 26 Aug 2014 14:53:25 +0000 Subject: [maker-devel] MAKER run error -with blast Message-ID: <1409064805543.27602@uga.edu> Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error2.txt URL: From carsonhh at gmail.com Tue Aug 26 10:03:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 09:03:28 -0600 Subject: [maker-devel] MAKER run error -with blast Message-ID: Make sure you are not setting TMP= in the maker_opts.ctl file to an NFS mounted location. Also check your /tmp directory to see if it is full or nearly full (it will be mounted on a different drive than your working directory). Also if it is being caused by slow NFS response you can set clean_try=1 and it will do complete retry on the contig rather than trying to recover partial files. --Carson From: Sivaranjani Namasivayam Date: Tuesday, August 26, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER run error -with blast Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Aug 26 10:55:40 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 26 Aug 2014 11:55:40 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I > believe and most 2.31 updates were related to tRNAscan). Current version > is 2.31.6. Could you give it a try and see if it is still giving you the > issue. > > I did a quick look through the archives and I think this was found and > fixed --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-devel/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the >> first exon, the tRNA feature, and the gene feature? Am I reading this >> correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, >> I would suggest that the correct representation would be to have all >> features on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 26 11:06:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 10:06:26 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Thanks. --Carson From: Daniel Standage Date: Tuesday, August 26, 2014 at 9:55 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe > and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. > Could you give it a try and see if it is still giving you the issue. > > I did a quick look through the archives and I think this was found and fixed > --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-dev > el/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the first >> exon, the tRNA feature, and the gene feature? Am I reading this correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, I >> would suggest that the correct representation would be to have all features >> on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Wed Aug 27 10:52:54 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Wed, 27 Aug 2014 15:52:54 +0000 Subject: [maker-devel] non-redundant fasta and gff Message-ID: Hi Is there a way to produce a fasta file and gff for a set of non-redundant genes predicted by the Maker software. Fasta-merge and gff-merge generate a file that has different prediction (e.g generated by Augustus, GeneMark etc. ) for the same gene sac as as individual genes. Regards Hossein From carsonhh at gmail.com Wed Aug 27 10:57:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:57:10 -0600 Subject: [maker-devel] non-redundant fasta and gff Message-ID: The fasta files created for augustus, snap, etc. are only for reference purposes. They are the raw ab initio prediction produced by these algorithms ran by themselves (they are match/match_part features in the GFF3 file). The file you want is the maker.transcripts.fasta and maker.proteins.fasta files. They contain the non-redundant final annotations. They are the same ones that are marked as gene/mRNA/exon/CDS features in the GFF3 file. --Carson On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: >Hi > > >Is there a way to produce a fasta file and gff for a set of non-redundant >genes predicted by the Maker software. Fasta-merge and gff-merge generate >a file that has different prediction (e.g generated by Augustus, >GeneMark etc. ) for the same gene sac as as individual genes. > > > >Regards > > >Hossein > > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 27 10:58:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:58:47 -0600 Subject: [maker-devel] non-redundant fasta and gff In-Reply-To: References: Message-ID: Please see the documentation wiki for explanations of how to read and use MAEKR's output. http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_ GMOD_Online_Training_2014#MAKER.27s_Output Thanks, Carson On 8/27/14, 9:57 AM, "Carson Holt" wrote: >The fasta files created for augustus, snap, etc. are only for reference >purposes. They are the raw ab initio prediction produced by these >algorithms ran by themselves (they are match/match_part features in the >GFF3 file). The file you want is the maker.transcripts.fasta and >maker.proteins.fasta files. They contain the non-redundant final >annotations. They are the same ones that are marked as gene/mRNA/exon/CDS >features in the GFF3 file. > >--Carson > > >On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: > >>Hi >> >> >>Is there a way to produce a fasta file and gff for a set of non-redundant >>genes predicted by the Maker software. Fasta-merge and gff-merge generate >>a file that has different prediction (e.g generated by Augustus, >>GeneMark etc. ) for the same gene sac as as individual genes. >> >> >> >>Regards >> >> >>Hossein >> >> >> >> >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Aug 4 14:27:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 04 Aug 2014 14:27:08 -0600 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: Sorry for the slow reply. I was on vacation all last week. Do you have the full STDERR? sometimes the last error is irrelevant and it's just the result of a failure further upstream. Also are you running 20 independent maker jobs simultaneously? --Carson From: Jan Philip Oeyen Date: Monday, July 28, 2014 at 6:22 AM To: Subject: [maker-devel] Forks.pm error when running maker with dsindex Hi all, we are currently having some unexpected errors when running maker on a genome which is split in several parts. Our cluster admin reported the following error message: Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm line 2188. SIGTERM received SIGTERM received SIGTERM received We were using maker with the '-g' option on a single genome which is split into 20 parts, where 19 parts are equally large and the last contains about 20 sequences more. After that we ran Maker using dsindex to clean up the output. We are currently using maker v2.31 on 4 threads and forks v0.34. If any further info is needed to clarify the problem, please let me know and I will provide as much as possible. Thank you for your help! Best regards, Jan Philip Oeyen ZFMK // ZMB // University of Bonn _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevintsai at iis.sinica.edu.tw Tue Aug 5 04:59:45 2014 From: kevintsai at iis.sinica.edu.tw (Kevin Tsai) Date: Tue, 5 Aug 2014 18:59:45 +0800 Subject: [maker-devel] Early obstacle with SplitDB Message-ID: Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- *Kevin Tsai* www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:21:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:21:51 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:26:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:26:51 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai Date: Tuesday, August 5, 2014 at 4:59 AM To: Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:49:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:49:33 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner , Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 6 01:03:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 06 Aug 2014 01:03:26 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: If it happening only with GFF3 pass-through, then it may be something I saw and fixed a while ago (there were some GFF3 passthrough fixes since 2.31.4). Could you check and see if it still happens in 2.31.6. Also if it is only the first or last CDS/exon, then Augustus can do that and it's not actually a bug. Basically it is truncating the model to the start/stop codon so the first or last exon/CDS may appear short, but it's really just incomplete. If you can find any example of a non-CDS/exon feature then could you send it to me? Thanks, Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Aug 6 01:15:04 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 07:15:04 +0000 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Message-ID: Ok. I took a look and I'm relatively sure the issue you are seeing is caused by GFF3 passthrough combined with correct_est_fusion=1. This is something that only happens when both are used simultaneously and should be corrected in the current version of MAKER. Thanks, Carson From: Marc H?ppner > Date: Wednesday, August 6, 2014 at 12:14 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt > wrote: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner >, > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM To: > Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 06:40:19 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 14:40:19 +0200 Subject: [maker-devel] Further split genome questions Message-ID: Hi Carson, I ran into more conspicuous behavior running maker 2.31 on a genome which is split into 20 parts, using the -g flag and the same basename. Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while the remaining three seemed to be stalled and produced 0B of output. Do you have any suggestion why this is happening? After I stopped these stalled jobs, I checked the index.log and found that of 38.384 mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these only appear as FINISHED (the rest only started). There are no models for these 'finished' scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., each of the 20 jobs contained scaffolds that 'did not start' but 'finished') Should this be an issue of concern? It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so we suspect something fishy going on... Hope you can help, best wishes, Jeanne Wilbrandt zmb // ZFMK // University of Bonn From carsonhh at gmail.com Wed Aug 6 08:16:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 08:16:52 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <780B8D9B-94FB-4282-9611-632C7CB532DC@gmail.com> If you are starting and restarting, or running multiple jobs then the log can be partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 result file for the contig, then it is FINISHED. FASTA files will only exist for the contigs that have gene models. Small contigs will rarely contain models. --Carson Sent from my iPhone > On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: > > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 6 08:18:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 6 Aug 2014 14:18:28 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <736D63C9-1393-4FFB-8553-262454C44BC1@genetics.utah.edu> Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent among those scaffolds? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.wilbrandt at zfmk.de Wed Aug 6 09:01:02 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:01:02 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: aha, so this explains that. Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly half of the sequences being shorter than 3,000 bp. What do you think about this weird 'I am running but not really doing anything'-behavior? Thanks a lot! Jeanne On Wed, 6 Aug 2014 14:16:52 +0000 Carson Holt wrote: >If you are starting and restarting, or running multiple jobs then the log can be >partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >result file for the contig, then it is FINISHED. FASTA files will only exist for the >contigs that have gene models. Small contigs will rarely contain models. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >> >> >> Hi Carson, >> >> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >into >> 20 parts, using the -g flag and the same basename. >> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >while >> the remaining three seemed to be stalled and produced 0B of output. Do you have any >> suggestion why this is happening? >> >> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >these >> only appear as FINISHED (the rest only started). There are no models for these >'finished' >> scaffolds stored in the .db and they are distributed over all parts of the genome >(i.e., >> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >> Should this be an issue of concern? >> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >so >> we suspect something fishy going on... >> >> Hope you can help, >> best wishes, >> Jeanne Wilbrandt >> >> zmb // ZFMK // University of Bonn >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 6 09:12:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:12:50 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <5C8B509A-7093-4626-92CE-6D09B570887C@gmail.com> I think the freezing is because you are starting too many simultaneous jobs. You should try and use MPI to parallelize instead. The concurrent job way of doing things can start to cause problems If you are running 10 or more jobs in the same directory. You could try splitting them into different directories. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: > > > aha, so this explains that. > Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly > half of the sequences being shorter than 3,000 bp. > > What do you think about this weird 'I am running but not really doing anything'-behavior? > > > Thanks a lot! > Jeanne > > > > On Wed, 6 Aug 2014 14:16:52 +0000 > Carson Holt wrote: >> If you are starting and restarting, or running multiple jobs then the log can be >> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >> result file for the contig, then it is FINISHED. FASTA files will only exist for the >> contigs that have gene models. Small contigs will rarely contain models. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> Hi Carson, >>> >>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >> into >>> 20 parts, using the -g flag and the same basename. >>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >> while >>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>> suggestion why this is happening? >>> >>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >> these >>> only appear as FINISHED (the rest only started). There are no models for these >> 'finished' >>> scaffolds stored in the .db and they are distributed over all parts of the genome >> (i.e., >>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>> Should this be an issue of concern? >>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >> so >>> we suspect something fishy going on... >>> >>> Hope you can help, >>> best wishes, >>> Jeanne Wilbrandt >>> >>> zmb // ZFMK // University of Bonn >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From j.wilbrandt at zfmk.de Wed Aug 6 09:33:07 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:33:07 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports however, that the processes seem to assemble more threads than they are allowed. It is not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? If I start the jobs in the same directory, how can I make sure they write to the same directory (as, I think is required to put the pieces together in the end?)? das -basename take paths? On Wed, 6 Aug 2014 15:12:50 +0000 Carson Holt wrote: >I think the freezing is because you are starting too many simultaneous jobs. You should >try and use MPI to parallelize instead. The concurrent job way of doing things can >start to cause problems If you are running 10 or more jobs in the same directory. You >could try splitting them into different directories. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >> >> >> aha, so this explains that. >> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >> half of the sequences being shorter than 3,000 bp. >> >> What do you think about this weird 'I am running but not really doing >anything'-behavior? >> >> >> Thanks a lot! >> Jeanne >> >> >> >> On Wed, 6 Aug 2014 14:16:52 +0000 >> Carson Holt wrote: >>> If you are starting and restarting, or running multiple jobs then the log can be >>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >GFF3 >>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>> contigs that have gene models. Small contigs will rarely contain models. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> Hi Carson, >>>> >>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>> into >>>> 20 parts, using the -g flag and the same basename. >>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>> while >>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>> suggestion why this is happening? >>>> >>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>> these >>>> only appear as FINISHED (the rest only started). There are no models for these >>> 'finished' >>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>> (i.e., >>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>> Should this be an issue of concern? >>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >good, >>> so >>>> we suspect something fishy going on... >>>> >>>> Hope you can help, >>>> best wishes, >>>> Jeanne Wilbrandt >>>> >>>> zmb // ZFMK // University of Bonn >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From carsonhh at gmail.com Wed Aug 6 09:45:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:45:56 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: <28DF9A41-8E59-4104-87A6-CD7CD9F436D8@gmail.com> Is your admin counting processes or cpu usage? Because each system call creates a separate process, so you can expect multiple processes (each system call generates a new process) but only a single cpu of usage per instance. Use different directories if you are running that many jobs. You can concatenate the separate results when your done. Use gff3_merge script to help concatenate the separate GFF3 files generated from separate jobs. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: > > > > We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports > however, that the processes seem to assemble more threads than they are allowed. It is > not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? > > If I start the jobs in the same directory, how can I make sure they write to the same > directory (as, I think is required to put the pieces together in the end?)? das -basename > take paths? > > > On Wed, 6 Aug 2014 15:12:50 +0000 > Carson Holt wrote: >> I think the freezing is because you are starting too many simultaneous jobs. You should >> try and use MPI to parallelize instead. The concurrent job way of doing things can >> start to cause problems If you are running 10 or more jobs in the same directory. You >> could try splitting them into different directories. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> aha, so this explains that. >>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >>> half of the sequences being shorter than 3,000 bp. >>> >>> What do you think about this weird 'I am running but not really doing >> anything'-behavior? >>> >>> >>> Thanks a lot! >>> Jeanne >>> >>> >>> >>> On Wed, 6 Aug 2014 14:16:52 +0000 >>> Carson Holt wrote: >>>> If you are starting and restarting, or running multiple jobs then the log can be >>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >> GFF3 >>>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>>> contigs that have gene models. Small contigs will rarely contain models. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>> >>>>> >>>>> Hi Carson, >>>>> >>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>> into >>>>> 20 parts, using the -g flag and the same basename. >>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>>> while >>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>>> suggestion why this is happening? >>>>> >>>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>> these >>>>> only appear as FINISHED (the rest only started). There are no models for these >>>> 'finished' >>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>> (i.e., >>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>> Should this be an issue of concern? >>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >> good, >>>> so >>>>> we suspect something fishy going on... >>>>> >>>>> Hope you can help, >>>>> best wishes, >>>>> Jeanne Wilbrandt >>>>> >>>>> zmb // ZFMK // University of Bonn >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carson.holt at genetics.utah.edu Wed Aug 6 11:18:22 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 17:18:22 +0000 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: It's better to run fewer jobs with more cpus given to MPI rather than many jobs with few cpus (i.e. mpiexec -n 4). To correct errors, you just restart MAKER. No need to set the -a flag unless you want to rerun everything, and not just the failed contigs. --Carson On 8/6/14, 3:03 AM, "Jeanne Wilbrandt" wrote: > >Hi! > >Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our >cluster can >take. Do you think this is too much? > >Please find attached the output file (containing the STDERR) of the >dsindex-run, and one >example output of one of the pieces. > >Another quick question to make sure I understood the guides correctly: If >a job did not >finish properly, it should suffice to restart the same thing just with >the -a flag and it >should clean up and finish what it was supposed to, right? (i.e., it's >not necessary to >trace and delete the unfinished output manually?) > >Thank you again! >Jeanne Wilbrandt > >zmb // ZFMK // University of Bonn > > > >On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: >> >> >> 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 04 Aug 2014 14:27:08 -0600 >> From: Carson Holt >> To: Jan Philip Oeyen , >> >> Subject: Re: [maker-devel] Forks.pm error when running maker with >> dsindex >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Sorry for the slow reply. I was on vacation all last week. Do you >>have the >> full STDERR? sometimes the last error is irrelevant and it's just the >>result >> of a failure further upstream. Also are you running 20 independent maker >> jobs simultaneously? >> >> --Carson >> >> >> From: Jan Philip Oeyen >> Date: Monday, July 28, 2014 at 6:22 AM >> To: >> Subject: [maker-devel] Forks.pm error when running maker with dsindex >> >> Hi all, >> we are currently having some unexpected errors when running maker on a >> genome which is split in several parts. Our cluster admin reported the >> following error message: >> >> Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu >> les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm >> line 2188. >> SIGTERM received >> SIGTERM received >> SIGTERM received >> >> We were using maker with the '-g' option on a single genome which is >>split >> into 20 parts, where 19 parts are equally large and the last contains >>about >> 20 sequences more. After that we ran Maker using dsindex to clean up the >> output. We are currently using maker v2.31 on 4 threads and forks v0.34. >> >> If any further info is needed to clarify the problem, please let me >>know and >> I will provide as much as possible. >> >> Thank you for your help! >> >> Best regards, >> Jan Philip Oeyen >> ZFMK // ZMB // University of Bonn >> From mphoeppner at gmail.com Wed Aug 6 00:14:23 2014 From: mphoeppner at gmail.com (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Wed, 6 Aug 2014 08:14:23 +0200 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt wrote: > One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? > > --Carson > > > From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM > To: Marc H?ppner , > Subject: Re: [maker-devel] Maker GFF output with features of 0 length > > Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? > > --Carson > > > From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM > To: > Subject: [maker-devel] Maker GFF output with features of 0 length > > Hi, > > I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. > > For example: > > scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 > > > This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... > > I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. > > Regards, > > Marc > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 03:03:28 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 11:03:28 +0200 Subject: [maker-devel] Forks.pm error when running maker with dsindex Message-ID: Hi! Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our cluster can take. Do you think this is too much? Please find attached the output file (containing the STDERR) of the dsindex-run, and one example output of one of the pieces. Another quick question to make sure I understood the guides correctly: If a job did not finish properly, it should suffice to restart the same thing just with the -a flag and it should clean up and finish what it was supposed to, right? (i.e., it's not necessary to trace and delete the unfinished output manually?) Thank you again! Jeanne Wilbrandt zmb // ZFMK // University of Bonn On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: > > > 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 04 Aug 2014 14:27:08 -0600 > From: Carson Holt > To: Jan Philip Oeyen , > > Subject: Re: [maker-devel] Forks.pm error when running maker with > dsindex > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Sorry for the slow reply. I was on vacation all last week. Do you have the > full STDERR? sometimes the last error is irrelevant and it's just the result > of a failure further upstream. Also are you running 20 independent maker > jobs simultaneously? > > --Carson > > > From: Jan Philip Oeyen > Date: Monday, July 28, 2014 at 6:22 AM > To: > Subject: [maker-devel] Forks.pm error when running maker with dsindex > > Hi all, > we are currently having some unexpected errors when running maker on a > genome which is split in several parts. Our cluster admin reported the > following error message: > > Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu > les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm > line 2188. > SIGTERM received > SIGTERM received > SIGTERM received > > We were using maker with the '-g' option on a single genome which is split > into 20 parts, where 19 parts are equally large and the last contains about > 20 sequences more. After that we ran Maker using dsindex to clean up the > output. We are currently using maker v2.31 on 4 threads and forks v0.34. > > If any further info is needed to clarify the problem, please let me know and > I will provide as much as possible. > > Thank you for your help! > > Best regards, > Jan Philip Oeyen > ZFMK // ZMB // University of Bonn > -------------- next part -------------- A non-text attachment was scrubbed... Name: split_index.o2510 Type: application/octet-stream Size: 1641 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_04.o2490 Type: application/octet-stream Size: 8883704 bytes Desc: not available URL: From dandence at gmail.com Wed Aug 6 07:50:43 2014 From: dandence at gmail.com (Daniel Ence) Date: Wed, 6 Aug 2014 07:50:43 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Aug 11 10:11:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 11 Aug 2014 10:11:28 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson From: Kevin Tsai Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: > Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted > directory (must be locally mounted), the drive containing directory specified > by TMP= (defaults to /tmp) is full or nearly full, your input file is not > proper fasta format, or you are using an out of date version of BioPerl. > > Try the first three in the list then look at BioPerl. The BioPerl version > should be printed as part of the the debug output. > > --Carson > > > From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM > To: > Subject: [maker-devel] Early obstacle with SplitDB > > Hello, > I'm a new user to Maker so I suspect this will be a simple question, but I am > having trouble finding documentation on SplitDB. Our IT admin set up the > application and I'm running into the following issue about 30 seconds after > kickoff. Below is the debugged output: > > STATUS: Parsing control files... > Calling GI::load_control_files at /usr/bin/maker line 452. > Calling GI::new_instance_temp at /usr/bin/maker line 463. > Calling GI::mount_check at /usr/bin/maker line 465. > Calling GI::set_global_temp at /usr/bin/maker line 483. > STATUS: Processing and indexing input FASTA files... > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling List::Util::shuffle at /usr/bin/maker line 529. > Calling GI::split_db at /usr/bin/maker line 536. > Calling File::Path::rmtree at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling Iterator::Any::nextDef at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling mkdir at /usr/bin/maker line 537. > Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. > Calling system at /usr/bin/maker line 537. > ERROR: SplitDB not created correctly > > at /usr/local/share/perl5/GI.pm line 1144. > GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, > "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at > /usr/bin/maker line 537 > --> rank=NA, hostname=Za2.cglab > > Any suggestions? Thank you in advance! > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Aug 13 03:30:39 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 13 Aug 2014 15:00:39 +0530 Subject: [maker-devel] does MAKER modify input FASTA Message-ID: Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam From j.wilbrandt at zfmk.de Wed Aug 13 03:32:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 13 Aug 2014 11:32:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From dence at genetics.utah.edu Wed Aug 13 09:29:41 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:29:41 +0000 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: Hi Priyam, After MAKER has completed it's run and you've merged the results with gff3_merge, you can see the original fasta genome in the resulting gff3 file, below the ##FASTA pragma. For each scaffold in your genome, the masked fasta can be found in it's individual directory in the master_datastore that MAKER created to keep track of results. I'm pretty sure this will only be 'soft-masked' (lower-case letters) and not hard-masked ('N' characters). Let me know whether this helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Anurag Priyam [a.priyam at qmul.ac.uk] Sent: Wednesday, August 13, 2014 3:30 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] does MAKER modify input FASTA Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:46:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:46:27 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: The output fasta will be letter for letter identical to the input fasta and will be all uppercase. Only if your input fasta contains unrecognized characters (for example 'Y' in the middle of the nucleotide sequence) and you use the --fix_nucleotides flag will those unrecognized characters be changed to 'N'. The masked fasta can be pulled out of theVoid directory if you really need it. It will be called query_masked.fasta. --Carson On 8/13/14, 3:30 AM, "Anurag Priyam" wrote: >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 13 09:46:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:46:59 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de>, Message-ID: Hi Jeanne, I believe that's right. You can pass gff3_merge either a list of gff3 files or a maker-created datastore index file. To compile the pieces for each of your different runs you would give gff3_merge the datastore index file. To put those resulting gff3 files together, you would pass gff3_merge the list of gff3 files that you want to merge. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jeanne Wilbrandt [j.wilbrandt at zfmk.de] Sent: Wednesday, August 13, 2014 3:32 AM To: Carson Holt; Wilbrandt Jeanne Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Further split genome questions Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:47:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:47:15 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: It will actually be a mixture of hard and soft masking depending on the class of repeat. --Carson On 8/13/14, 9:29 AM, "Daniel Ence" wrote: >Hi Priyam, > >After MAKER has completed it's run and you've merged the results with >gff3_merge, you can see the original fasta genome in the resulting gff3 >file, below the ##FASTA pragma. > >For each scaffold in your genome, the masked fasta can be found in it's >individual directory in the master_datastore that MAKER created to keep >track of results. I'm pretty sure this will only be 'soft-masked' >(lower-case letters) and not hard-masked ('N' characters). > >Let me know whether this helps, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Anurag Priyam [a.priyam at qmul.ac.uk] >Sent: Wednesday, August 13, 2014 3:30 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] does MAKER modify input FASTA > >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:52:34 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Yes. One cpu will have several processes, most are helper processes that will use 0% CPU almost all of the time (for example there is a shared variable manager process that will launch with MAKER but will also be called 'maker' under top because it is technically its child and not a separate script). Also system calls will launch a new process that will use all CPU while the process calling it will drop to 0% CPU until it finishes. Yes. Your explanation is correct. You then use gff3_merge to merge the GFF3 file. --Carson On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >Our admin counts processes. Do I understand you right, that one CPU >handles several >processes? > >I'm still confused by the different directories (and I made a mistake >when asking last >time, I wanted to say 'If I do NOT start the jobs in the same >directory...). >So, if I start each piece of a genome in its own directory (for example), >then it gets a >unique basename (because the output will be separate from all other >pieces anyway) and I >will not run dsindex but instead use gff3_merge for each piece's output >and then once >again to merge all resulting gff3-files? > >Hope I got you right :) > >Thanks fopr your help! >Jeanne > > > >On Wed, 6 Aug 2014 15:45:56 +0000 > Carson Holt wrote: >>Is your admin counting processes or cpu usage? Because each system call >>creates a >>separate process, so you can expect multiple processes (each system call >>generates a new >>process) but only a single cpu of usage per instance. Use different >>directories if you >>are running that many jobs. You can concatenate the separate results >>when your done. >> Use gff3_merge script to help concatenate the separate GFF3 files >>generated from >>separate jobs. >> >>--Carson >> >>Sent from my iPhone >> >>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>wrote: >>> >>> >>> >>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>threads. Our admin >>reports >>> however, that the processes seem to assemble more threads than they >>>are allowed. It is >>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>suggestion why? >>> >>> If I start the jobs in the same directory, how can I make sure they >>>write to the same >>> directory (as, I think is required to put the pieces together in the >>>end?)? das >>-basename >>> take paths? >>> >>> >>> On Wed, 6 Aug 2014 15:12:50 +0000 >>> Carson Holt wrote: >>>> I think the freezing is because you are starting too many >>>>simultaneous jobs. You >>should >>>> try and use MPI to parallelize instead. The concurrent job way of >>>>doing things can >>>> start to cause problems If you are running 10 or more jobs in the >>>>same directory. You >>>> could try splitting them into different directories. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> aha, so this explains that. >>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>than 60,000, >>roughly >>>>> half of the sequences being shorter than 3,000 bp. >>>>> >>>>> What do you think about this weird 'I am running but not really doing >>>> anything'-behavior? >>>>> >>>>> >>>>> Thanks a lot! >>>>> Jeanne >>>>> >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>> Carson Holt wrote: >>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>the log can be >>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>> If there is a >>>> GFF3 >>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>only exist for >>the >>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>models. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Carson, >>>>>>> >>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>genome which is split >>>>>> into >>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>finish >>normally, >>>>>> while >>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>output. Do you have >>any >>>>>>> suggestion why this is happening? >>>>>>> >>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>found that of >>38.384 >>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>is, that 2/3 of >>>>>> these >>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>models for these >>>>>> 'finished' >>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>parts of the genome >>>>>> (i.e., >>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>'finished') >>>>>>> Should this be an issue of concern? >>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>NFS files look >>>> good, >>>>>> so >>>>>>> we suspect something fishy going on... >>>>>>> >>>>>>> Hope you can help, >>>>>>> best wishes, >>>>>>> Jeanne Wilbrandt >>>>>>> >>>>>>> zmb // ZFMK // University of Bonn >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>org >>> > From cjfields at illinois.edu Wed Aug 13 11:14:56 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 13 Aug 2014 17:14:56 +0000 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: On Aug 11, 2014, at 11:11 AM, Carson Holt > wrote: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson chris From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt > wrote: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM To: > Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 13 12:19:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 12:19:50 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: The Berkley_DB/NFS issues happen more often for large index files or NFS systems with a slow response. Such issues also happen almost exclusively during index creation. There is a way you can tell MAKER to have BioPerl use something other than Berkley DB for indexing if you suspect that's the issue. You can give it a flag during the initial MAKER setup and installation. #use GDBM library cd .../maker/src perl Build.PL --AnyDBM_ISA GDBM_File ./Build install #use SDBM files cd .../maker/src perl Build.PL --AnyDBM_ISA SDBM_File ./Build install #use Berkley DB (default) cd .../maker/src perl Build.PL --AnyDBM_ISA DB_File ./Build install However, I find that the alternatives to Berkley DB can be more flakey. Also make sure /tmp is not tmpfs (which it may be on some systems). I've also seen weird behavior trying to index files on tmpfs storage on some systems. Thanks, Carson From: "Fields, Christopher J" Date: Wednesday, August 13, 2014 at 11:14 AM To: Carson Holt Cc: Kevin Tsai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Early obstacle with SplitDB On Aug 11, 2014, at 11:11 AM, Carson Holt wrote: > If you are updating every month to BioPerl live, don't. You should use the > CPAN version of BioPerl or even the stable download. BioPerl live has > actually broken several components MAKER uses at different times and depending > on which version you currently have, may be broken now. Could you send me the > Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. > Also could you send me this file --> /home/keceltes/maker2/final.fasta > > The point of failure is actually very simple. At that point in the code, > MAKER opens a file, reads it in one line at a time, writes it out to a new > file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives > because it uses Berkley DB). For that reason whenever it fails at that point, > it is either a drive space issue, NFS issue, BioPerl issue, or file format > issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. > Also are you running via MPI? I ask because if you are using multiple nodes > you will have to check the sixe of /tmp independently on each node (since the > values will be different). > > Thanks, > Carson chris > From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM > To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB > > Hi Carson, > Thanks for the suggestions. > > I left the TMP= empty, which as you mentioned defaults to /tmp. There seems > to be a different error when using an NFS mounted directory (as I manually > verified). My /tmp is also not full or nearly full, I have verified proper > fasta formatting as I have run the fasta file through other statistics > generating tools (i.e. Quast). We are also update BioPerl monthly. > > Do you think it could be anything else? Do you think any more information > that I might be able to provide will be more insightful? > > > On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: >> Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted >> directory (must be locally mounted), the drive containing directory specified >> by TMP= (defaults to /tmp) is full or nearly full, your input file is not >> proper fasta format, or you are using an out of date version of BioPerl. >> >> Try the first three in the list then look at BioPerl. The BioPerl version >> should be printed as part of the the debug output. >> >> --Carson >> >> >> From: Kevin Tsai >> Date: Tuesday, August 5, 2014 at 4:59 AM >> To: >> Subject: [maker-devel] Early obstacle with SplitDB >> >> Hello, >> I'm a new user to Maker so I suspect this will be a simple question, but I am >> having trouble finding documentation on SplitDB. Our IT admin set up the >> application and I'm running into the following issue about 30 seconds after >> kickoff. Below is the debugged output: >> >> STATUS: Parsing control files... >> Calling GI::load_control_files at /usr/bin/maker line 452. >> Calling GI::new_instance_temp at /usr/bin/maker line 463. >> Calling GI::mount_check at /usr/bin/maker line 465. >> Calling GI::set_global_temp at /usr/bin/maker line 483. >> STATUS: Processing and indexing input FASTA files... >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling List::Util::shuffle at /usr/bin/maker line 529. >> Calling GI::split_db at /usr/bin/maker line 536. >> Calling File::Path::rmtree at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling Iterator::Any::nextDef at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling mkdir at /usr/bin/maker line 537. >> Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. >> Calling system at /usr/bin/maker line 537. >> ERROR: SplitDB not created correctly >> >> at /usr/local/share/perl5/GI.pm line 1144. >> GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, >> "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at >> /usr/bin/maker line 537 >> --> rank=NA, hostname=Za2.cglab >> >> Any suggestions? Thank you in advance! >> -- >> Kevin Tsai >> www.linkedin.com/in/kevinjtsai/ >> Ph.D. Candidate, Bioinformatics >> Institute of Information Science, Academia Sinica >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Thu Aug 14 09:40:04 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:40:04 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Thank you so much! However, I'm still, struggling, I'm afraid: I tried this 'two-step merging' approach with a subset of scaffolds and got duplicate IDs. Here is what I did: - divided input scaffolds in two files - run maker separately on these files (-> separate output dirs) -- additional input: maker-generated gff3 from previous (singular) run -- repeatmasking, snaphmm, gmhmm, augustus_species are given -- map_forward=0 / 1 (I tried both, to the same effect) - gff3_merge two times using index-log - gff3_merge these two gff3 files $ grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | uniq -c | sort -n | tail 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 $ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | grep "\sgene" scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 - found duplicates! i.e. the same ID for gene annotations in different areas of the same scaffold (of 655 gene annotations, 51 appear twice) -- this happens not only with gene, but also CDS and mRNA annotations, as far as I can see (here, in one example, non-everlapping but close CDS snippets got the same ID). I suspected this might have to do with the map_forward flag, but I get the same problem again (with genes at the same locations). I attached one of the ctl files for you in case you want to have a look, the other is analogous. Do you need something else? What did I miss? This should not happen, right? On Wed, 13 Aug 2014 15:52:34 +0000 Carson Holt wrote: >Yes. One cpu will have several processes, most are helper processes that >will use 0% CPU almost all of the time (for example there is a shared >variable manager process that will launch with MAKER but will also be >called 'maker' under top because it is technically its child and not a >separate script). Also system calls will launch a new process that will >use all CPU while the process calling it will drop to 0% CPU until it >finishes. > >Yes. Your explanation is correct. You then use gff3_merge to merge the >GFF3 file. > >--Carson > > > >On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >> >>Our admin counts processes. Do I understand you right, that one CPU >>handles several >>processes? >> >>I'm still confused by the different directories (and I made a mistake >>when asking last >>time, I wanted to say 'If I do NOT start the jobs in the same >>directory...). >>So, if I start each piece of a genome in its own directory (for example), >>then it gets a >>unique basename (because the output will be separate from all other >>pieces anyway) and I >>will not run dsindex but instead use gff3_merge for each piece's output >>and then once >>again to merge all resulting gff3-files? >> >>Hope I got you right :) >> >>Thanks fopr your help! >>Jeanne >> >> >> >>On Wed, 6 Aug 2014 15:45:56 +0000 >> Carson Holt wrote: >>>Is your admin counting processes or cpu usage? Because each system call >>>creates a >>>separate process, so you can expect multiple processes (each system call >>>generates a new >>>process) but only a single cpu of usage per instance. Use different >>>directories if you >>>are running that many jobs. You can concatenate the separate results >>>when your done. >>> Use gff3_merge script to help concatenate the separate GFF3 files >>>generated from >>>separate jobs. >>> >>>--Carson >>> >>>Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>wrote: >>>> >>>> >>>> >>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>threads. Our admin >>>reports >>>> however, that the processes seem to assemble more threads than they >>>>are allowed. It is >>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>suggestion why? >>>> >>>> If I start the jobs in the same directory, how can I make sure they >>>>write to the same >>>> directory (as, I think is required to put the pieces together in the >>>>end?)? das >>>-basename >>>> take paths? >>>> >>>> >>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>> Carson Holt wrote: >>>>> I think the freezing is because you are starting too many >>>>>simultaneous jobs. You >>>should >>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>doing things can >>>>> start to cause problems If you are running 10 or more jobs in the >>>>>same directory. You >>>>> could try splitting them into different directories. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> aha, so this explains that. >>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>than 60,000, >>>roughly >>>>>> half of the sequences being shorter than 3,000 bp. >>>>>> >>>>>> What do you think about this weird 'I am running but not really doing >>>>> anything'-behavior? >>>>>> >>>>>> >>>>>> Thanks a lot! >>>>>> Jeanne >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>> Carson Holt wrote: >>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>the log can be >>>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>>> If there is a >>>>> GFF3 >>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>only exist for >>>the >>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>models. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Carson, >>>>>>>> >>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>genome which is split >>>>>>> into >>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>>finish >>>normally, >>>>>>> while >>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>output. Do you have >>>any >>>>>>>> suggestion why this is happening? >>>>>>>> >>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>found that of >>>38.384 >>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>>is, that 2/3 of >>>>>>> these >>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>models for these >>>>>>> 'finished' >>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>parts of the genome >>>>>>> (i.e., >>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>'finished') >>>>>>>> Should this be an issue of concern? >>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>NFS files look >>>>> good, >>>>>>> so >>>>>>>> we suspect something fishy going on... >>>>>>>> >>>>>>>> Hope you can help, >>>>>>>> best wishes, >>>>>>>> Jeanne Wilbrandt >>>>>>>> >>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>>org >>>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts_Lclav_splitrun_problem_01_mapfwd.ctl Type: application/octet-stream Size: 5859 bytes Desc: not available URL: From carsonhh at gmail.com Thu Aug 14 09:46:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:46:44 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: What version of MAKER are you using? I'd also need to see the GFF3 files before the merge. You may also need to turn off map_forward since you are passing in GFF3 with MAKER names, creating new models with MAKER names and then moving names from old models forward onto new ones (which may force names to be used twice). --Carson On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >Thank you so much! > >However, I'm still, struggling, I'm afraid: I tried this 'two-step >merging' approach with >a subset of scaffolds and got duplicate IDs. > >Here is what I did: >- divided input scaffolds in two files >- run maker separately on these files (-> separate output dirs) >-- additional input: maker-generated gff3 from previous (singular) run >-- repeatmasking, snaphmm, gmhmm, augustus_species are given >-- map_forward=0 / 1 (I tried both, to the same effect) >- gff3_merge two times using index-log >- gff3_merge these two gff3 files > >$ >grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >uniq -c | sort -n >| tail > 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 > 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 > 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 > 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 > 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 > >$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >grep "\sgene" >scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 > >- found duplicates! i.e. the same ID for gene annotations in different >areas of the same >scaffold (of 655 gene annotations, 51 appear twice) >-- this happens not only with gene, but also CDS and mRNA annotations, as >far as I can >see (here, in one example, non-everlapping but close CDS snippets got the >same ID). > > >I suspected this might have to do with the map_forward flag, but I get >the same problem >again (with genes at the same locations). >I attached one of the ctl files for you in case you want to have a look, >the other is >analogous. Do you need something else? > >What did I miss? This should not happen, right? > > > > >On Wed, 13 Aug 2014 15:52:34 +0000 > Carson Holt wrote: >>Yes. One cpu will have several processes, most are helper processes that >>will use 0% CPU almost all of the time (for example there is a shared >>variable manager process that will launch with MAKER but will also be >>called 'maker' under top because it is technically its child and not a >>separate script). Also system calls will launch a new process that will >>use all CPU while the process calling it will drop to 0% CPU until it >>finishes. >> >>Yes. Your explanation is correct. You then use gff3_merge to merge the >>GFF3 file. >> >>--Carson >> >> >> >>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Our admin counts processes. Do I understand you right, that one CPU >>>handles several >>>processes? >>> >>>I'm still confused by the different directories (and I made a mistake >>>when asking last >>>time, I wanted to say 'If I do NOT start the jobs in the same >>>directory...). >>>So, if I start each piece of a genome in its own directory (for >>>example), >>>then it gets a >>>unique basename (because the output will be separate from all other >>>pieces anyway) and I >>>will not run dsindex but instead use gff3_merge for each piece's output >>>and then once >>>again to merge all resulting gff3-files? >>> >>>Hope I got you right :) >>> >>>Thanks fopr your help! >>>Jeanne >>> >>> >>> >>>On Wed, 6 Aug 2014 15:45:56 +0000 >>> Carson Holt wrote: >>>>Is your admin counting processes or cpu usage? Because each system >>>>call >>>>creates a >>>>separate process, so you can expect multiple processes (each system >>>>call >>>>generates a new >>>>process) but only a single cpu of usage per instance. Use different >>>>directories if you >>>>are running that many jobs. You can concatenate the separate results >>>>when your done. >>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>generated from >>>>separate jobs. >>>> >>>>--Carson >>>> >>>>Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> >>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>threads. Our admin >>>>reports >>>>> however, that the processes seem to assemble more threads than they >>>>>are allowed. It is >>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>suggestion why? >>>>> >>>>> If I start the jobs in the same directory, how can I make sure they >>>>>write to the same >>>>> directory (as, I think is required to put the pieces together in the >>>>>end?)? das >>>>-basename >>>>> take paths? >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>> Carson Holt wrote: >>>>>> I think the freezing is because you are starting too many >>>>>>simultaneous jobs. You >>>>should >>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>doing things can >>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>same directory. You >>>>>> could try splitting them into different directories. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> aha, so this explains that. >>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>than 60,000, >>>>roughly >>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>> >>>>>>> What do you think about this weird 'I am running but not really >>>>>>>doing >>>>>> anything'-behavior? >>>>>>> >>>>>>> >>>>>>> Thanks a lot! >>>>>>> Jeanne >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>the log can be >>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>added. >>>>>>>> If there is a >>>>>> GFF3 >>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>only exist for >>>>the >>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>models. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Carson, >>>>>>>>> >>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>genome which is split >>>>>>>> into >>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>to >>>>>>>>>finish >>>>normally, >>>>>>>> while >>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>output. Do you have >>>>any >>>>>>>>> suggestion why this is happening? >>>>>>>>> >>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>found that of >>>>38.384 >>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>surprise >>>>>>>>>is, that 2/3 of >>>>>>>> these >>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>models for these >>>>>>>> 'finished' >>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>parts of the genome >>>>>>>> (i.e., >>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>'finished') >>>>>>>>> Should this be an issue of concern? >>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>NFS files look >>>>>> good, >>>>>>>> so >>>>>>>>> we suspect something fishy going on... >>>>>>>>> >>>>>>>>> Hope you can help, >>>>>>>>> best wishes, >>>>>>>>> Jeanne Wilbrandt >>>>>>>>> >>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> maker-devel mailing list >>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>> >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b. >>>>>>>>>org >>>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:55:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:55:15 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Which 2.31? Current is 2.31.6. --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:57:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:57:39 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: For the file you just sent me, is that from the first run with map_forward=0 or with map_forward=1? --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From j.wilbrandt at zfmk.de Thu Aug 14 09:53:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:53:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: It is version 2.31. My first try was done with map_forward=0, and (I just noticed) the duplicates are present in the separate gff3s already also in this case (one is attached). Has this something to do with the first-run-gff3 I fed it? On Thu, 14 Aug 2014 15:46:44 +0000 Carson Holt wrote: >What version of MAKER are you using? I'd also need to see the GFF3 files >before the merge. You may also need to turn off map_forward since you are >passing in GFF3 with MAKER names, creating new models with MAKER names and >then moving names from old models forward onto new ones (which may force >names to be used twice). > >--Carson > > >On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >> >>Thank you so much! >> >>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>merging' approach with >>a subset of scaffolds and got duplicate IDs. >> >>Here is what I did: >>- divided input scaffolds in two files >>- run maker separately on these files (-> separate output dirs) >>-- additional input: maker-generated gff3 from previous (singular) run >>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>-- map_forward=0 / 1 (I tried both, to the same effect) >>- gff3_merge two times using index-log >>- gff3_merge these two gff3 files >> >>$ >>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>uniq -c | sort -n >>| tail >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >> >>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>grep "\sgene" >>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >> >>- found duplicates! i.e. the same ID for gene annotations in different >>areas of the same >>scaffold (of 655 gene annotations, 51 appear twice) >>-- this happens not only with gene, but also CDS and mRNA annotations, as >>far as I can >>see (here, in one example, non-everlapping but close CDS snippets got the >>same ID). >> >> >>I suspected this might have to do with the map_forward flag, but I get >>the same problem >>again (with genes at the same locations). >>I attached one of the ctl files for you in case you want to have a look, >>the other is >>analogous. Do you need something else? >> >>What did I miss? This should not happen, right? >> >> >> >> >>On Wed, 13 Aug 2014 15:52:34 +0000 >> Carson Holt wrote: >>>Yes. One cpu will have several processes, most are helper processes that >>>will use 0% CPU almost all of the time (for example there is a shared >>>variable manager process that will launch with MAKER but will also be >>>called 'maker' under top because it is technically its child and not a >>>separate script). Also system calls will launch a new process that will >>>use all CPU while the process calling it will drop to 0% CPU until it >>>finishes. >>> >>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>GFF3 file. >>> >>>--Carson >>> >>> >>> >>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>> >>>> >>>>Our admin counts processes. Do I understand you right, that one CPU >>>>handles several >>>>processes? >>>> >>>>I'm still confused by the different directories (and I made a mistake >>>>when asking last >>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>directory...). >>>>So, if I start each piece of a genome in its own directory (for >>>>example), >>>>then it gets a >>>>unique basename (because the output will be separate from all other >>>>pieces anyway) and I >>>>will not run dsindex but instead use gff3_merge for each piece's output >>>>and then once >>>>again to merge all resulting gff3-files? >>>> >>>>Hope I got you right :) >>>> >>>>Thanks fopr your help! >>>>Jeanne >>>> >>>> >>>> >>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>> Carson Holt wrote: >>>>>Is your admin counting processes or cpu usage? Because each system >>>>>call >>>>>creates a >>>>>separate process, so you can expect multiple processes (each system >>>>>call >>>>>generates a new >>>>>process) but only a single cpu of usage per instance. Use different >>>>>directories if you >>>>>are running that many jobs. You can concatenate the separate results >>>>>when your done. >>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>generated from >>>>>separate jobs. >>>>> >>>>>--Carson >>>>> >>>>>Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>threads. Our admin >>>>>reports >>>>>> however, that the processes seem to assemble more threads than they >>>>>>are allowed. It is >>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>suggestion why? >>>>>> >>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>write to the same >>>>>> directory (as, I think is required to put the pieces together in the >>>>>>end?)? das >>>>>-basename >>>>>> take paths? >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>> Carson Holt wrote: >>>>>>> I think the freezing is because you are starting too many >>>>>>>simultaneous jobs. You >>>>>should >>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>doing things can >>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>same directory. You >>>>>>> could try splitting them into different directories. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>> >>>>>>>>wrote: >>>>>>>> >>>>>>>> >>>>>>>> aha, so this explains that. >>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>than 60,000, >>>>>roughly >>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>> >>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>doing >>>>>>> anything'-behavior? >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> Jeanne >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>> Carson Holt wrote: >>>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>>the log can be >>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>added. >>>>>>>>> If there is a >>>>>>> GFF3 >>>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>>only exist for >>>>>the >>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>models. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Carson, >>>>>>>>>> >>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>genome which is split >>>>>>>>> into >>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>to >>>>>>>>>>finish >>>>>normally, >>>>>>>>> while >>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>output. Do you have >>>>>any >>>>>>>>>> suggestion why this is happening? >>>>>>>>>> >>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>found that of >>>>>38.384 >>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>surprise >>>>>>>>>>is, that 2/3 of >>>>>>>>> these >>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>models for these >>>>>>>>> 'finished' >>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>parts of the genome >>>>>>>>> (i.e., >>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>>'finished') >>>>>>>>>> Should this be an issue of concern? >>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>>NFS files look >>>>>>> good, >>>>>>>>> so >>>>>>>>>> we suspect something fishy going on... >>>>>>>>>> >>>>>>>>>> Hope you can help, >>>>>>>>>> best wishes, >>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>> >>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>>b. >>>>>>>>>>org >>>>>> >>>> >>> >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: splitrun_problem_01_all.gff3 Type: application/octet-stream Size: 4967463 bytes Desc: not available URL: From daniel.standage at gmail.com Thu Aug 21 09:33:33 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:33:33 -0400 Subject: [maker-devel] tRNAscan GFF3 Message-ID: Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:35:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:35:16 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: It should be on the same strand. Which MAKER version are you using? --Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:33 AM To: Maker Mailing List Subject: [maker-devel] tRNAscan GFF3 Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu Aug 21 09:36:41 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:36:41 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it > looks like Maker reports the second exon on the opposite strand as the > first exon, the tRNA feature, and the gene feature? Am I reading this > correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, > I would suggest that the correct representation would be to have all > features on the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:49:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:49:36 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: I half way remember some tRNAscan bugs being fixed in several of the sub versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. Could you give it a try and see if it is still giving you the issue. I did a quick look through the archives and I think this was found and fixed --> https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-d evel/Z-kvf_V2ynU/vstSNjHgyJQJ Thanks, Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:36 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it looks > like Maker reports the second exon on the opposite strand as the first exon, > the tRNA feature, and the gene feature? Am I reading this correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, I > would suggest that the correct representation would be to have all features on > the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Tue Aug 19 03:19:08 2014 From: rens.holmer at wur.nl (rens holmer) Date: Tue, 19 Aug 2014 11:19:08 +0200 Subject: [maker-devel] Maker error mpiexec Message-ID: Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Timothy.Stitt at tgac.ac.uk Thu Aug 21 14:05:46 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 21 Aug 2014 20:05:46 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:17:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:17:22 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:21:19 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:21:19 -0600 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: You need to make sure the same version of MPI is used to compile and run MAKER. When installing MAKER make sure the mpi.h and mpicc indicated during configuration come from the same version of OpenMPI as the mpiexec command you are using now. Also for OpenMPI run the following command before setting up or launching MAKER --> export LD_PRELOAD=?/openmpi_location/lib/libmpi.so replace openmpi_location in the above command with the location of your OpenMPI. Setting LD_PRELOAD preload is required for OpenMPI to work correctly with shared libraries. Also you may need to add the following to your MPI command before running MAKER. --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 40 maker Thanks, Carson From: rens holmer Date: Tuesday, August 19, 2014 at 3:19 AM To: Subject: [maker-devel] Maker error mpiexec Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:27:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:27:14 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: FYI. If you use the -nolock flag, never start MAKER more than once in the same directory. The lack of file locks means MAKER won't detect the other active process and they can end up overwriting each others output. So do any parallelization via MPI instead. Thanks, Carson From: Carson Holt Date: Thursday, August 21, 2014 at 2:17 PM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Fri Aug 22 04:43:20 2014 From: rens.holmer at wur.nl (rens holmer) Date: Fri, 22 Aug 2014 12:43:20 +0200 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: Thank you! export LD_PRELOAD=?/openmpi_location/lib/libmpi.so mpiexec -mca btl ^openib -n 40 maker Those two tweaks did the trick! Sincerely, Rens Holmer On Thu, Aug 21, 2014 at 10:21 PM, Carson Holt wrote: > You need to make sure the same version of MPI is used to compile and run > MAKER. When installing MAKER make sure the mpi.h and mpicc indicated > during configuration come from the same version of OpenMPI as the mpiexec > command you are using now. > > Also for OpenMPI run the following command before setting up or launching > MAKER --> > export LD_PRELOAD=?/openmpi_location/lib/libmpi.so > > replace openmpi_location in the above command with the location of your > OpenMPI. > > Setting LD_PRELOAD preload is required for OpenMPI to work correctly with > shared libraries. > > > Also you may need to add the following to your MPI command before running > MAKER. > --> -mca btl ^openib > Example --> mpiexec -mca btl ^openib -n 40 maker > > Thanks, > Carson > > > > From: rens holmer > Date: Tuesday, August 19, 2014 at 3:19 AM > To: > Subject: [maker-devel] Maker error mpiexec > > Hi, > > I am trying to run maker using MPI, and I get an error I do not understand. > > Maker version: 2.13.6 > mpiexec version: mpiexec (OpenRTE) 1.6.5 > > When I run ./Build status it is reported that MPI is enabled. > > When I run mpiexec -n 40 maker I get the following errors: > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > opal_shmem_base_select failed > > --> Returned value -1 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > > > Etcetera etcetera. > > However: when I search for the files reported as missing I do find them, > and I don't believe they are from a different version of MPI? > > Am I using a wrong version of MPI? > > Any help would be appreciated, > > Sincerely, > > > Rens Holmer > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Aug 26 08:53:25 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 26 Aug 2014 14:53:25 +0000 Subject: [maker-devel] MAKER run error -with blast Message-ID: <1409064805543.27602@uga.edu> Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error2.txt URL: From carsonhh at gmail.com Tue Aug 26 09:03:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 09:03:28 -0600 Subject: [maker-devel] MAKER run error -with blast Message-ID: Make sure you are not setting TMP= in the maker_opts.ctl file to an NFS mounted location. Also check your /tmp directory to see if it is full or nearly full (it will be mounted on a different drive than your working directory). Also if it is being caused by slow NFS response you can set clean_try=1 and it will do complete retry on the contig rather than trying to recover partial files. --Carson From: Sivaranjani Namasivayam Date: Tuesday, August 26, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER run error -with blast Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Aug 26 09:55:40 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 26 Aug 2014 11:55:40 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I > believe and most 2.31 updates were related to tRNAscan). Current version > is 2.31.6. Could you give it a try and see if it is still giving you the > issue. > > I did a quick look through the archives and I think this was found and > fixed --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-devel/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the >> first exon, the tRNA feature, and the gene feature? Am I reading this >> correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, >> I would suggest that the correct representation would be to have all >> features on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 26 10:06:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 10:06:26 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Thanks. --Carson From: Daniel Standage Date: Tuesday, August 26, 2014 at 9:55 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe > and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. > Could you give it a try and see if it is still giving you the issue. > > I did a quick look through the archives and I think this was found and fixed > --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-dev > el/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the first >> exon, the tRNA feature, and the gene feature? Am I reading this correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, I >> would suggest that the correct representation would be to have all features >> on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Wed Aug 27 09:52:54 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Wed, 27 Aug 2014 15:52:54 +0000 Subject: [maker-devel] non-redundant fasta and gff Message-ID: Hi Is there a way to produce a fasta file and gff for a set of non-redundant genes predicted by the Maker software. Fasta-merge and gff-merge generate a file that has different prediction (e.g generated by Augustus, GeneMark etc. ) for the same gene sac as as individual genes. Regards Hossein From carsonhh at gmail.com Wed Aug 27 09:57:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:57:10 -0600 Subject: [maker-devel] non-redundant fasta and gff Message-ID: The fasta files created for augustus, snap, etc. are only for reference purposes. They are the raw ab initio prediction produced by these algorithms ran by themselves (they are match/match_part features in the GFF3 file). The file you want is the maker.transcripts.fasta and maker.proteins.fasta files. They contain the non-redundant final annotations. They are the same ones that are marked as gene/mRNA/exon/CDS features in the GFF3 file. --Carson On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: >Hi > > >Is there a way to produce a fasta file and gff for a set of non-redundant >genes predicted by the Maker software. Fasta-merge and gff-merge generate >a file that has different prediction (e.g generated by Augustus, >GeneMark etc. ) for the same gene sac as as individual genes. > > > >Regards > > >Hossein > > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 27 09:58:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:58:47 -0600 Subject: [maker-devel] non-redundant fasta and gff In-Reply-To: References: Message-ID: Please see the documentation wiki for explanations of how to read and use MAEKR's output. http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_ GMOD_Online_Training_2014#MAKER.27s_Output Thanks, Carson On 8/27/14, 9:57 AM, "Carson Holt" wrote: >The fasta files created for augustus, snap, etc. are only for reference >purposes. They are the raw ab initio prediction produced by these >algorithms ran by themselves (they are match/match_part features in the >GFF3 file). The file you want is the maker.transcripts.fasta and >maker.proteins.fasta files. They contain the non-redundant final >annotations. They are the same ones that are marked as gene/mRNA/exon/CDS >features in the GFF3 file. > >--Carson > > >On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: > >>Hi >> >> >>Is there a way to produce a fasta file and gff for a set of non-redundant >>genes predicted by the Maker software. Fasta-merge and gff-merge generate >>a file that has different prediction (e.g generated by Augustus, >>GeneMark etc. ) for the same gene sac as as individual genes. >> >> >> >>Regards >> >> >>Hossein >> >> >> >> >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Aug 4 14:27:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 04 Aug 2014 14:27:08 -0600 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: Sorry for the slow reply. I was on vacation all last week. Do you have the full STDERR? sometimes the last error is irrelevant and it's just the result of a failure further upstream. Also are you running 20 independent maker jobs simultaneously? --Carson From: Jan Philip Oeyen Date: Monday, July 28, 2014 at 6:22 AM To: Subject: [maker-devel] Forks.pm error when running maker with dsindex Hi all, we are currently having some unexpected errors when running maker on a genome which is split in several parts. Our cluster admin reported the following error message: Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm line 2188. SIGTERM received SIGTERM received SIGTERM received We were using maker with the '-g' option on a single genome which is split into 20 parts, where 19 parts are equally large and the last contains about 20 sequences more. After that we ran Maker using dsindex to clean up the output. We are currently using maker v2.31 on 4 threads and forks v0.34. If any further info is needed to clarify the problem, please let me know and I will provide as much as possible. Thank you for your help! Best regards, Jan Philip Oeyen ZFMK // ZMB // University of Bonn _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevintsai at iis.sinica.edu.tw Tue Aug 5 04:59:45 2014 From: kevintsai at iis.sinica.edu.tw (Kevin Tsai) Date: Tue, 5 Aug 2014 18:59:45 +0800 Subject: [maker-devel] Early obstacle with SplitDB Message-ID: Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- *Kevin Tsai* www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:21:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:21:51 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:26:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:26:51 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai Date: Tuesday, August 5, 2014 at 4:59 AM To: Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:49:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:49:33 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner , Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 6 01:03:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 06 Aug 2014 01:03:26 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: If it happening only with GFF3 pass-through, then it may be something I saw and fixed a while ago (there were some GFF3 passthrough fixes since 2.31.4). Could you check and see if it still happens in 2.31.6. Also if it is only the first or last CDS/exon, then Augustus can do that and it's not actually a bug. Basically it is truncating the model to the start/stop codon so the first or last exon/CDS may appear short, but it's really just incomplete. If you can find any example of a non-CDS/exon feature then could you send it to me? Thanks, Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Aug 6 01:15:04 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 07:15:04 +0000 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Message-ID: Ok. I took a look and I'm relatively sure the issue you are seeing is caused by GFF3 passthrough combined with correct_est_fusion=1. This is something that only happens when both are used simultaneously and should be corrected in the current version of MAKER. Thanks, Carson From: Marc H?ppner > Date: Wednesday, August 6, 2014 at 12:14 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt > wrote: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner >, > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM To: > Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 06:40:19 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 14:40:19 +0200 Subject: [maker-devel] Further split genome questions Message-ID: Hi Carson, I ran into more conspicuous behavior running maker 2.31 on a genome which is split into 20 parts, using the -g flag and the same basename. Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while the remaining three seemed to be stalled and produced 0B of output. Do you have any suggestion why this is happening? After I stopped these stalled jobs, I checked the index.log and found that of 38.384 mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these only appear as FINISHED (the rest only started). There are no models for these 'finished' scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., each of the 20 jobs contained scaffolds that 'did not start' but 'finished') Should this be an issue of concern? It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so we suspect something fishy going on... Hope you can help, best wishes, Jeanne Wilbrandt zmb // ZFMK // University of Bonn From carsonhh at gmail.com Wed Aug 6 08:16:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 08:16:52 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <780B8D9B-94FB-4282-9611-632C7CB532DC@gmail.com> If you are starting and restarting, or running multiple jobs then the log can be partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 result file for the contig, then it is FINISHED. FASTA files will only exist for the contigs that have gene models. Small contigs will rarely contain models. --Carson Sent from my iPhone > On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: > > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 6 08:18:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 6 Aug 2014 14:18:28 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <736D63C9-1393-4FFB-8553-262454C44BC1@genetics.utah.edu> Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent among those scaffolds? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.wilbrandt at zfmk.de Wed Aug 6 09:01:02 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:01:02 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: aha, so this explains that. Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly half of the sequences being shorter than 3,000 bp. What do you think about this weird 'I am running but not really doing anything'-behavior? Thanks a lot! Jeanne On Wed, 6 Aug 2014 14:16:52 +0000 Carson Holt wrote: >If you are starting and restarting, or running multiple jobs then the log can be >partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >result file for the contig, then it is FINISHED. FASTA files will only exist for the >contigs that have gene models. Small contigs will rarely contain models. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >> >> >> Hi Carson, >> >> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >into >> 20 parts, using the -g flag and the same basename. >> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >while >> the remaining three seemed to be stalled and produced 0B of output. Do you have any >> suggestion why this is happening? >> >> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >these >> only appear as FINISHED (the rest only started). There are no models for these >'finished' >> scaffolds stored in the .db and they are distributed over all parts of the genome >(i.e., >> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >> Should this be an issue of concern? >> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >so >> we suspect something fishy going on... >> >> Hope you can help, >> best wishes, >> Jeanne Wilbrandt >> >> zmb // ZFMK // University of Bonn >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 6 09:12:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:12:50 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <5C8B509A-7093-4626-92CE-6D09B570887C@gmail.com> I think the freezing is because you are starting too many simultaneous jobs. You should try and use MPI to parallelize instead. The concurrent job way of doing things can start to cause problems If you are running 10 or more jobs in the same directory. You could try splitting them into different directories. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: > > > aha, so this explains that. > Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly > half of the sequences being shorter than 3,000 bp. > > What do you think about this weird 'I am running but not really doing anything'-behavior? > > > Thanks a lot! > Jeanne > > > > On Wed, 6 Aug 2014 14:16:52 +0000 > Carson Holt wrote: >> If you are starting and restarting, or running multiple jobs then the log can be >> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >> result file for the contig, then it is FINISHED. FASTA files will only exist for the >> contigs that have gene models. Small contigs will rarely contain models. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> Hi Carson, >>> >>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >> into >>> 20 parts, using the -g flag and the same basename. >>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >> while >>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>> suggestion why this is happening? >>> >>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >> these >>> only appear as FINISHED (the rest only started). There are no models for these >> 'finished' >>> scaffolds stored in the .db and they are distributed over all parts of the genome >> (i.e., >>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>> Should this be an issue of concern? >>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >> so >>> we suspect something fishy going on... >>> >>> Hope you can help, >>> best wishes, >>> Jeanne Wilbrandt >>> >>> zmb // ZFMK // University of Bonn >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From j.wilbrandt at zfmk.de Wed Aug 6 09:33:07 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:33:07 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports however, that the processes seem to assemble more threads than they are allowed. It is not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? If I start the jobs in the same directory, how can I make sure they write to the same directory (as, I think is required to put the pieces together in the end?)? das -basename take paths? On Wed, 6 Aug 2014 15:12:50 +0000 Carson Holt wrote: >I think the freezing is because you are starting too many simultaneous jobs. You should >try and use MPI to parallelize instead. The concurrent job way of doing things can >start to cause problems If you are running 10 or more jobs in the same directory. You >could try splitting them into different directories. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >> >> >> aha, so this explains that. >> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >> half of the sequences being shorter than 3,000 bp. >> >> What do you think about this weird 'I am running but not really doing >anything'-behavior? >> >> >> Thanks a lot! >> Jeanne >> >> >> >> On Wed, 6 Aug 2014 14:16:52 +0000 >> Carson Holt wrote: >>> If you are starting and restarting, or running multiple jobs then the log can be >>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >GFF3 >>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>> contigs that have gene models. Small contigs will rarely contain models. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> Hi Carson, >>>> >>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>> into >>>> 20 parts, using the -g flag and the same basename. >>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>> while >>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>> suggestion why this is happening? >>>> >>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>> these >>>> only appear as FINISHED (the rest only started). There are no models for these >>> 'finished' >>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>> (i.e., >>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>> Should this be an issue of concern? >>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >good, >>> so >>>> we suspect something fishy going on... >>>> >>>> Hope you can help, >>>> best wishes, >>>> Jeanne Wilbrandt >>>> >>>> zmb // ZFMK // University of Bonn >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From carsonhh at gmail.com Wed Aug 6 09:45:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:45:56 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: <28DF9A41-8E59-4104-87A6-CD7CD9F436D8@gmail.com> Is your admin counting processes or cpu usage? Because each system call creates a separate process, so you can expect multiple processes (each system call generates a new process) but only a single cpu of usage per instance. Use different directories if you are running that many jobs. You can concatenate the separate results when your done. Use gff3_merge script to help concatenate the separate GFF3 files generated from separate jobs. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: > > > > We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports > however, that the processes seem to assemble more threads than they are allowed. It is > not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? > > If I start the jobs in the same directory, how can I make sure they write to the same > directory (as, I think is required to put the pieces together in the end?)? das -basename > take paths? > > > On Wed, 6 Aug 2014 15:12:50 +0000 > Carson Holt wrote: >> I think the freezing is because you are starting too many simultaneous jobs. You should >> try and use MPI to parallelize instead. The concurrent job way of doing things can >> start to cause problems If you are running 10 or more jobs in the same directory. You >> could try splitting them into different directories. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> aha, so this explains that. >>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >>> half of the sequences being shorter than 3,000 bp. >>> >>> What do you think about this weird 'I am running but not really doing >> anything'-behavior? >>> >>> >>> Thanks a lot! >>> Jeanne >>> >>> >>> >>> On Wed, 6 Aug 2014 14:16:52 +0000 >>> Carson Holt wrote: >>>> If you are starting and restarting, or running multiple jobs then the log can be >>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >> GFF3 >>>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>>> contigs that have gene models. Small contigs will rarely contain models. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>> >>>>> >>>>> Hi Carson, >>>>> >>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>> into >>>>> 20 parts, using the -g flag and the same basename. >>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>>> while >>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>>> suggestion why this is happening? >>>>> >>>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>> these >>>>> only appear as FINISHED (the rest only started). There are no models for these >>>> 'finished' >>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>> (i.e., >>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>> Should this be an issue of concern? >>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >> good, >>>> so >>>>> we suspect something fishy going on... >>>>> >>>>> Hope you can help, >>>>> best wishes, >>>>> Jeanne Wilbrandt >>>>> >>>>> zmb // ZFMK // University of Bonn >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carson.holt at genetics.utah.edu Wed Aug 6 11:18:22 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 17:18:22 +0000 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: It's better to run fewer jobs with more cpus given to MPI rather than many jobs with few cpus (i.e. mpiexec -n 4). To correct errors, you just restart MAKER. No need to set the -a flag unless you want to rerun everything, and not just the failed contigs. --Carson On 8/6/14, 3:03 AM, "Jeanne Wilbrandt" wrote: > >Hi! > >Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our >cluster can >take. Do you think this is too much? > >Please find attached the output file (containing the STDERR) of the >dsindex-run, and one >example output of one of the pieces. > >Another quick question to make sure I understood the guides correctly: If >a job did not >finish properly, it should suffice to restart the same thing just with >the -a flag and it >should clean up and finish what it was supposed to, right? (i.e., it's >not necessary to >trace and delete the unfinished output manually?) > >Thank you again! >Jeanne Wilbrandt > >zmb // ZFMK // University of Bonn > > > >On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: >> >> >> 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 04 Aug 2014 14:27:08 -0600 >> From: Carson Holt >> To: Jan Philip Oeyen , >> >> Subject: Re: [maker-devel] Forks.pm error when running maker with >> dsindex >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Sorry for the slow reply. I was on vacation all last week. Do you >>have the >> full STDERR? sometimes the last error is irrelevant and it's just the >>result >> of a failure further upstream. Also are you running 20 independent maker >> jobs simultaneously? >> >> --Carson >> >> >> From: Jan Philip Oeyen >> Date: Monday, July 28, 2014 at 6:22 AM >> To: >> Subject: [maker-devel] Forks.pm error when running maker with dsindex >> >> Hi all, >> we are currently having some unexpected errors when running maker on a >> genome which is split in several parts. Our cluster admin reported the >> following error message: >> >> Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu >> les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm >> line 2188. >> SIGTERM received >> SIGTERM received >> SIGTERM received >> >> We were using maker with the '-g' option on a single genome which is >>split >> into 20 parts, where 19 parts are equally large and the last contains >>about >> 20 sequences more. After that we ran Maker using dsindex to clean up the >> output. We are currently using maker v2.31 on 4 threads and forks v0.34. >> >> If any further info is needed to clarify the problem, please let me >>know and >> I will provide as much as possible. >> >> Thank you for your help! >> >> Best regards, >> Jan Philip Oeyen >> ZFMK // ZMB // University of Bonn >> From mphoeppner at gmail.com Wed Aug 6 00:14:23 2014 From: mphoeppner at gmail.com (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Wed, 6 Aug 2014 08:14:23 +0200 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt wrote: > One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? > > --Carson > > > From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM > To: Marc H?ppner , > Subject: Re: [maker-devel] Maker GFF output with features of 0 length > > Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? > > --Carson > > > From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM > To: > Subject: [maker-devel] Maker GFF output with features of 0 length > > Hi, > > I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. > > For example: > > scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 > > > This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... > > I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. > > Regards, > > Marc > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 03:03:28 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 11:03:28 +0200 Subject: [maker-devel] Forks.pm error when running maker with dsindex Message-ID: Hi! Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our cluster can take. Do you think this is too much? Please find attached the output file (containing the STDERR) of the dsindex-run, and one example output of one of the pieces. Another quick question to make sure I understood the guides correctly: If a job did not finish properly, it should suffice to restart the same thing just with the -a flag and it should clean up and finish what it was supposed to, right? (i.e., it's not necessary to trace and delete the unfinished output manually?) Thank you again! Jeanne Wilbrandt zmb // ZFMK // University of Bonn On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: > > > 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 04 Aug 2014 14:27:08 -0600 > From: Carson Holt > To: Jan Philip Oeyen , > > Subject: Re: [maker-devel] Forks.pm error when running maker with > dsindex > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Sorry for the slow reply. I was on vacation all last week. Do you have the > full STDERR? sometimes the last error is irrelevant and it's just the result > of a failure further upstream. Also are you running 20 independent maker > jobs simultaneously? > > --Carson > > > From: Jan Philip Oeyen > Date: Monday, July 28, 2014 at 6:22 AM > To: > Subject: [maker-devel] Forks.pm error when running maker with dsindex > > Hi all, > we are currently having some unexpected errors when running maker on a > genome which is split in several parts. Our cluster admin reported the > following error message: > > Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu > les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm > line 2188. > SIGTERM received > SIGTERM received > SIGTERM received > > We were using maker with the '-g' option on a single genome which is split > into 20 parts, where 19 parts are equally large and the last contains about > 20 sequences more. After that we ran Maker using dsindex to clean up the > output. We are currently using maker v2.31 on 4 threads and forks v0.34. > > If any further info is needed to clarify the problem, please let me know and > I will provide as much as possible. > > Thank you for your help! > > Best regards, > Jan Philip Oeyen > ZFMK // ZMB // University of Bonn > -------------- next part -------------- A non-text attachment was scrubbed... Name: split_index.o2510 Type: application/octet-stream Size: 1641 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_04.o2490 Type: application/octet-stream Size: 8883704 bytes Desc: not available URL: From dandence at gmail.com Wed Aug 6 07:50:43 2014 From: dandence at gmail.com (Daniel Ence) Date: Wed, 6 Aug 2014 07:50:43 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Aug 11 10:11:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 11 Aug 2014 10:11:28 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson From: Kevin Tsai Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: > Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted > directory (must be locally mounted), the drive containing directory specified > by TMP= (defaults to /tmp) is full or nearly full, your input file is not > proper fasta format, or you are using an out of date version of BioPerl. > > Try the first three in the list then look at BioPerl. The BioPerl version > should be printed as part of the the debug output. > > --Carson > > > From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM > To: > Subject: [maker-devel] Early obstacle with SplitDB > > Hello, > I'm a new user to Maker so I suspect this will be a simple question, but I am > having trouble finding documentation on SplitDB. Our IT admin set up the > application and I'm running into the following issue about 30 seconds after > kickoff. Below is the debugged output: > > STATUS: Parsing control files... > Calling GI::load_control_files at /usr/bin/maker line 452. > Calling GI::new_instance_temp at /usr/bin/maker line 463. > Calling GI::mount_check at /usr/bin/maker line 465. > Calling GI::set_global_temp at /usr/bin/maker line 483. > STATUS: Processing and indexing input FASTA files... > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling List::Util::shuffle at /usr/bin/maker line 529. > Calling GI::split_db at /usr/bin/maker line 536. > Calling File::Path::rmtree at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling Iterator::Any::nextDef at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling mkdir at /usr/bin/maker line 537. > Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. > Calling system at /usr/bin/maker line 537. > ERROR: SplitDB not created correctly > > at /usr/local/share/perl5/GI.pm line 1144. > GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, > "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at > /usr/bin/maker line 537 > --> rank=NA, hostname=Za2.cglab > > Any suggestions? Thank you in advance! > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Aug 13 03:30:39 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 13 Aug 2014 15:00:39 +0530 Subject: [maker-devel] does MAKER modify input FASTA Message-ID: Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam From j.wilbrandt at zfmk.de Wed Aug 13 03:32:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 13 Aug 2014 11:32:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From dence at genetics.utah.edu Wed Aug 13 09:29:41 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:29:41 +0000 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: Hi Priyam, After MAKER has completed it's run and you've merged the results with gff3_merge, you can see the original fasta genome in the resulting gff3 file, below the ##FASTA pragma. For each scaffold in your genome, the masked fasta can be found in it's individual directory in the master_datastore that MAKER created to keep track of results. I'm pretty sure this will only be 'soft-masked' (lower-case letters) and not hard-masked ('N' characters). Let me know whether this helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Anurag Priyam [a.priyam at qmul.ac.uk] Sent: Wednesday, August 13, 2014 3:30 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] does MAKER modify input FASTA Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:46:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:46:27 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: The output fasta will be letter for letter identical to the input fasta and will be all uppercase. Only if your input fasta contains unrecognized characters (for example 'Y' in the middle of the nucleotide sequence) and you use the --fix_nucleotides flag will those unrecognized characters be changed to 'N'. The masked fasta can be pulled out of theVoid directory if you really need it. It will be called query_masked.fasta. --Carson On 8/13/14, 3:30 AM, "Anurag Priyam" wrote: >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 13 09:46:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:46:59 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de>, Message-ID: Hi Jeanne, I believe that's right. You can pass gff3_merge either a list of gff3 files or a maker-created datastore index file. To compile the pieces for each of your different runs you would give gff3_merge the datastore index file. To put those resulting gff3 files together, you would pass gff3_merge the list of gff3 files that you want to merge. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jeanne Wilbrandt [j.wilbrandt at zfmk.de] Sent: Wednesday, August 13, 2014 3:32 AM To: Carson Holt; Wilbrandt Jeanne Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Further split genome questions Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:47:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:47:15 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: It will actually be a mixture of hard and soft masking depending on the class of repeat. --Carson On 8/13/14, 9:29 AM, "Daniel Ence" wrote: >Hi Priyam, > >After MAKER has completed it's run and you've merged the results with >gff3_merge, you can see the original fasta genome in the resulting gff3 >file, below the ##FASTA pragma. > >For each scaffold in your genome, the masked fasta can be found in it's >individual directory in the master_datastore that MAKER created to keep >track of results. I'm pretty sure this will only be 'soft-masked' >(lower-case letters) and not hard-masked ('N' characters). > >Let me know whether this helps, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Anurag Priyam [a.priyam at qmul.ac.uk] >Sent: Wednesday, August 13, 2014 3:30 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] does MAKER modify input FASTA > >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:52:34 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Yes. One cpu will have several processes, most are helper processes that will use 0% CPU almost all of the time (for example there is a shared variable manager process that will launch with MAKER but will also be called 'maker' under top because it is technically its child and not a separate script). Also system calls will launch a new process that will use all CPU while the process calling it will drop to 0% CPU until it finishes. Yes. Your explanation is correct. You then use gff3_merge to merge the GFF3 file. --Carson On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >Our admin counts processes. Do I understand you right, that one CPU >handles several >processes? > >I'm still confused by the different directories (and I made a mistake >when asking last >time, I wanted to say 'If I do NOT start the jobs in the same >directory...). >So, if I start each piece of a genome in its own directory (for example), >then it gets a >unique basename (because the output will be separate from all other >pieces anyway) and I >will not run dsindex but instead use gff3_merge for each piece's output >and then once >again to merge all resulting gff3-files? > >Hope I got you right :) > >Thanks fopr your help! >Jeanne > > > >On Wed, 6 Aug 2014 15:45:56 +0000 > Carson Holt wrote: >>Is your admin counting processes or cpu usage? Because each system call >>creates a >>separate process, so you can expect multiple processes (each system call >>generates a new >>process) but only a single cpu of usage per instance. Use different >>directories if you >>are running that many jobs. You can concatenate the separate results >>when your done. >> Use gff3_merge script to help concatenate the separate GFF3 files >>generated from >>separate jobs. >> >>--Carson >> >>Sent from my iPhone >> >>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>wrote: >>> >>> >>> >>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>threads. Our admin >>reports >>> however, that the processes seem to assemble more threads than they >>>are allowed. It is >>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>suggestion why? >>> >>> If I start the jobs in the same directory, how can I make sure they >>>write to the same >>> directory (as, I think is required to put the pieces together in the >>>end?)? das >>-basename >>> take paths? >>> >>> >>> On Wed, 6 Aug 2014 15:12:50 +0000 >>> Carson Holt wrote: >>>> I think the freezing is because you are starting too many >>>>simultaneous jobs. You >>should >>>> try and use MPI to parallelize instead. The concurrent job way of >>>>doing things can >>>> start to cause problems If you are running 10 or more jobs in the >>>>same directory. You >>>> could try splitting them into different directories. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> aha, so this explains that. >>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>than 60,000, >>roughly >>>>> half of the sequences being shorter than 3,000 bp. >>>>> >>>>> What do you think about this weird 'I am running but not really doing >>>> anything'-behavior? >>>>> >>>>> >>>>> Thanks a lot! >>>>> Jeanne >>>>> >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>> Carson Holt wrote: >>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>the log can be >>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>> If there is a >>>> GFF3 >>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>only exist for >>the >>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>models. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Carson, >>>>>>> >>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>genome which is split >>>>>> into >>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>finish >>normally, >>>>>> while >>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>output. Do you have >>any >>>>>>> suggestion why this is happening? >>>>>>> >>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>found that of >>38.384 >>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>is, that 2/3 of >>>>>> these >>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>models for these >>>>>> 'finished' >>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>parts of the genome >>>>>> (i.e., >>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>'finished') >>>>>>> Should this be an issue of concern? >>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>NFS files look >>>> good, >>>>>> so >>>>>>> we suspect something fishy going on... >>>>>>> >>>>>>> Hope you can help, >>>>>>> best wishes, >>>>>>> Jeanne Wilbrandt >>>>>>> >>>>>>> zmb // ZFMK // University of Bonn >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>org >>> > From cjfields at illinois.edu Wed Aug 13 11:14:56 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 13 Aug 2014 17:14:56 +0000 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: On Aug 11, 2014, at 11:11 AM, Carson Holt > wrote: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson chris From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt > wrote: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM To: > Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 13 12:19:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 12:19:50 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: The Berkley_DB/NFS issues happen more often for large index files or NFS systems with a slow response. Such issues also happen almost exclusively during index creation. There is a way you can tell MAKER to have BioPerl use something other than Berkley DB for indexing if you suspect that's the issue. You can give it a flag during the initial MAKER setup and installation. #use GDBM library cd .../maker/src perl Build.PL --AnyDBM_ISA GDBM_File ./Build install #use SDBM files cd .../maker/src perl Build.PL --AnyDBM_ISA SDBM_File ./Build install #use Berkley DB (default) cd .../maker/src perl Build.PL --AnyDBM_ISA DB_File ./Build install However, I find that the alternatives to Berkley DB can be more flakey. Also make sure /tmp is not tmpfs (which it may be on some systems). I've also seen weird behavior trying to index files on tmpfs storage on some systems. Thanks, Carson From: "Fields, Christopher J" Date: Wednesday, August 13, 2014 at 11:14 AM To: Carson Holt Cc: Kevin Tsai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Early obstacle with SplitDB On Aug 11, 2014, at 11:11 AM, Carson Holt wrote: > If you are updating every month to BioPerl live, don't. You should use the > CPAN version of BioPerl or even the stable download. BioPerl live has > actually broken several components MAKER uses at different times and depending > on which version you currently have, may be broken now. Could you send me the > Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. > Also could you send me this file --> /home/keceltes/maker2/final.fasta > > The point of failure is actually very simple. At that point in the code, > MAKER opens a file, reads it in one line at a time, writes it out to a new > file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives > because it uses Berkley DB). For that reason whenever it fails at that point, > it is either a drive space issue, NFS issue, BioPerl issue, or file format > issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. > Also are you running via MPI? I ask because if you are using multiple nodes > you will have to check the sixe of /tmp independently on each node (since the > values will be different). > > Thanks, > Carson chris > From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM > To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB > > Hi Carson, > Thanks for the suggestions. > > I left the TMP= empty, which as you mentioned defaults to /tmp. There seems > to be a different error when using an NFS mounted directory (as I manually > verified). My /tmp is also not full or nearly full, I have verified proper > fasta formatting as I have run the fasta file through other statistics > generating tools (i.e. Quast). We are also update BioPerl monthly. > > Do you think it could be anything else? Do you think any more information > that I might be able to provide will be more insightful? > > > On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: >> Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted >> directory (must be locally mounted), the drive containing directory specified >> by TMP= (defaults to /tmp) is full or nearly full, your input file is not >> proper fasta format, or you are using an out of date version of BioPerl. >> >> Try the first three in the list then look at BioPerl. The BioPerl version >> should be printed as part of the the debug output. >> >> --Carson >> >> >> From: Kevin Tsai >> Date: Tuesday, August 5, 2014 at 4:59 AM >> To: >> Subject: [maker-devel] Early obstacle with SplitDB >> >> Hello, >> I'm a new user to Maker so I suspect this will be a simple question, but I am >> having trouble finding documentation on SplitDB. Our IT admin set up the >> application and I'm running into the following issue about 30 seconds after >> kickoff. Below is the debugged output: >> >> STATUS: Parsing control files... >> Calling GI::load_control_files at /usr/bin/maker line 452. >> Calling GI::new_instance_temp at /usr/bin/maker line 463. >> Calling GI::mount_check at /usr/bin/maker line 465. >> Calling GI::set_global_temp at /usr/bin/maker line 483. >> STATUS: Processing and indexing input FASTA files... >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling List::Util::shuffle at /usr/bin/maker line 529. >> Calling GI::split_db at /usr/bin/maker line 536. >> Calling File::Path::rmtree at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling Iterator::Any::nextDef at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling mkdir at /usr/bin/maker line 537. >> Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. >> Calling system at /usr/bin/maker line 537. >> ERROR: SplitDB not created correctly >> >> at /usr/local/share/perl5/GI.pm line 1144. >> GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, >> "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at >> /usr/bin/maker line 537 >> --> rank=NA, hostname=Za2.cglab >> >> Any suggestions? Thank you in advance! >> -- >> Kevin Tsai >> www.linkedin.com/in/kevinjtsai/ >> Ph.D. Candidate, Bioinformatics >> Institute of Information Science, Academia Sinica >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Thu Aug 14 09:40:04 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:40:04 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Thank you so much! However, I'm still, struggling, I'm afraid: I tried this 'two-step merging' approach with a subset of scaffolds and got duplicate IDs. Here is what I did: - divided input scaffolds in two files - run maker separately on these files (-> separate output dirs) -- additional input: maker-generated gff3 from previous (singular) run -- repeatmasking, snaphmm, gmhmm, augustus_species are given -- map_forward=0 / 1 (I tried both, to the same effect) - gff3_merge two times using index-log - gff3_merge these two gff3 files $ grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | uniq -c | sort -n | tail 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 $ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | grep "\sgene" scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 - found duplicates! i.e. the same ID for gene annotations in different areas of the same scaffold (of 655 gene annotations, 51 appear twice) -- this happens not only with gene, but also CDS and mRNA annotations, as far as I can see (here, in one example, non-everlapping but close CDS snippets got the same ID). I suspected this might have to do with the map_forward flag, but I get the same problem again (with genes at the same locations). I attached one of the ctl files for you in case you want to have a look, the other is analogous. Do you need something else? What did I miss? This should not happen, right? On Wed, 13 Aug 2014 15:52:34 +0000 Carson Holt wrote: >Yes. One cpu will have several processes, most are helper processes that >will use 0% CPU almost all of the time (for example there is a shared >variable manager process that will launch with MAKER but will also be >called 'maker' under top because it is technically its child and not a >separate script). Also system calls will launch a new process that will >use all CPU while the process calling it will drop to 0% CPU until it >finishes. > >Yes. Your explanation is correct. You then use gff3_merge to merge the >GFF3 file. > >--Carson > > > >On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >> >>Our admin counts processes. Do I understand you right, that one CPU >>handles several >>processes? >> >>I'm still confused by the different directories (and I made a mistake >>when asking last >>time, I wanted to say 'If I do NOT start the jobs in the same >>directory...). >>So, if I start each piece of a genome in its own directory (for example), >>then it gets a >>unique basename (because the output will be separate from all other >>pieces anyway) and I >>will not run dsindex but instead use gff3_merge for each piece's output >>and then once >>again to merge all resulting gff3-files? >> >>Hope I got you right :) >> >>Thanks fopr your help! >>Jeanne >> >> >> >>On Wed, 6 Aug 2014 15:45:56 +0000 >> Carson Holt wrote: >>>Is your admin counting processes or cpu usage? Because each system call >>>creates a >>>separate process, so you can expect multiple processes (each system call >>>generates a new >>>process) but only a single cpu of usage per instance. Use different >>>directories if you >>>are running that many jobs. You can concatenate the separate results >>>when your done. >>> Use gff3_merge script to help concatenate the separate GFF3 files >>>generated from >>>separate jobs. >>> >>>--Carson >>> >>>Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>wrote: >>>> >>>> >>>> >>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>threads. Our admin >>>reports >>>> however, that the processes seem to assemble more threads than they >>>>are allowed. It is >>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>suggestion why? >>>> >>>> If I start the jobs in the same directory, how can I make sure they >>>>write to the same >>>> directory (as, I think is required to put the pieces together in the >>>>end?)? das >>>-basename >>>> take paths? >>>> >>>> >>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>> Carson Holt wrote: >>>>> I think the freezing is because you are starting too many >>>>>simultaneous jobs. You >>>should >>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>doing things can >>>>> start to cause problems If you are running 10 or more jobs in the >>>>>same directory. You >>>>> could try splitting them into different directories. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> aha, so this explains that. >>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>than 60,000, >>>roughly >>>>>> half of the sequences being shorter than 3,000 bp. >>>>>> >>>>>> What do you think about this weird 'I am running but not really doing >>>>> anything'-behavior? >>>>>> >>>>>> >>>>>> Thanks a lot! >>>>>> Jeanne >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>> Carson Holt wrote: >>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>the log can be >>>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>>> If there is a >>>>> GFF3 >>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>only exist for >>>the >>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>models. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Carson, >>>>>>>> >>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>genome which is split >>>>>>> into >>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>>finish >>>normally, >>>>>>> while >>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>output. Do you have >>>any >>>>>>>> suggestion why this is happening? >>>>>>>> >>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>found that of >>>38.384 >>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>>is, that 2/3 of >>>>>>> these >>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>models for these >>>>>>> 'finished' >>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>parts of the genome >>>>>>> (i.e., >>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>'finished') >>>>>>>> Should this be an issue of concern? >>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>NFS files look >>>>> good, >>>>>>> so >>>>>>>> we suspect something fishy going on... >>>>>>>> >>>>>>>> Hope you can help, >>>>>>>> best wishes, >>>>>>>> Jeanne Wilbrandt >>>>>>>> >>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>>org >>>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts_Lclav_splitrun_problem_01_mapfwd.ctl Type: application/octet-stream Size: 5859 bytes Desc: not available URL: From carsonhh at gmail.com Thu Aug 14 09:46:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:46:44 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: What version of MAKER are you using? I'd also need to see the GFF3 files before the merge. You may also need to turn off map_forward since you are passing in GFF3 with MAKER names, creating new models with MAKER names and then moving names from old models forward onto new ones (which may force names to be used twice). --Carson On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >Thank you so much! > >However, I'm still, struggling, I'm afraid: I tried this 'two-step >merging' approach with >a subset of scaffolds and got duplicate IDs. > >Here is what I did: >- divided input scaffolds in two files >- run maker separately on these files (-> separate output dirs) >-- additional input: maker-generated gff3 from previous (singular) run >-- repeatmasking, snaphmm, gmhmm, augustus_species are given >-- map_forward=0 / 1 (I tried both, to the same effect) >- gff3_merge two times using index-log >- gff3_merge these two gff3 files > >$ >grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >uniq -c | sort -n >| tail > 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 > 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 > 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 > 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 > 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 > >$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >grep "\sgene" >scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 > >- found duplicates! i.e. the same ID for gene annotations in different >areas of the same >scaffold (of 655 gene annotations, 51 appear twice) >-- this happens not only with gene, but also CDS and mRNA annotations, as >far as I can >see (here, in one example, non-everlapping but close CDS snippets got the >same ID). > > >I suspected this might have to do with the map_forward flag, but I get >the same problem >again (with genes at the same locations). >I attached one of the ctl files for you in case you want to have a look, >the other is >analogous. Do you need something else? > >What did I miss? This should not happen, right? > > > > >On Wed, 13 Aug 2014 15:52:34 +0000 > Carson Holt wrote: >>Yes. One cpu will have several processes, most are helper processes that >>will use 0% CPU almost all of the time (for example there is a shared >>variable manager process that will launch with MAKER but will also be >>called 'maker' under top because it is technically its child and not a >>separate script). Also system calls will launch a new process that will >>use all CPU while the process calling it will drop to 0% CPU until it >>finishes. >> >>Yes. Your explanation is correct. You then use gff3_merge to merge the >>GFF3 file. >> >>--Carson >> >> >> >>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Our admin counts processes. Do I understand you right, that one CPU >>>handles several >>>processes? >>> >>>I'm still confused by the different directories (and I made a mistake >>>when asking last >>>time, I wanted to say 'If I do NOT start the jobs in the same >>>directory...). >>>So, if I start each piece of a genome in its own directory (for >>>example), >>>then it gets a >>>unique basename (because the output will be separate from all other >>>pieces anyway) and I >>>will not run dsindex but instead use gff3_merge for each piece's output >>>and then once >>>again to merge all resulting gff3-files? >>> >>>Hope I got you right :) >>> >>>Thanks fopr your help! >>>Jeanne >>> >>> >>> >>>On Wed, 6 Aug 2014 15:45:56 +0000 >>> Carson Holt wrote: >>>>Is your admin counting processes or cpu usage? Because each system >>>>call >>>>creates a >>>>separate process, so you can expect multiple processes (each system >>>>call >>>>generates a new >>>>process) but only a single cpu of usage per instance. Use different >>>>directories if you >>>>are running that many jobs. You can concatenate the separate results >>>>when your done. >>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>generated from >>>>separate jobs. >>>> >>>>--Carson >>>> >>>>Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> >>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>threads. Our admin >>>>reports >>>>> however, that the processes seem to assemble more threads than they >>>>>are allowed. It is >>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>suggestion why? >>>>> >>>>> If I start the jobs in the same directory, how can I make sure they >>>>>write to the same >>>>> directory (as, I think is required to put the pieces together in the >>>>>end?)? das >>>>-basename >>>>> take paths? >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>> Carson Holt wrote: >>>>>> I think the freezing is because you are starting too many >>>>>>simultaneous jobs. You >>>>should >>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>doing things can >>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>same directory. You >>>>>> could try splitting them into different directories. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> aha, so this explains that. >>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>than 60,000, >>>>roughly >>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>> >>>>>>> What do you think about this weird 'I am running but not really >>>>>>>doing >>>>>> anything'-behavior? >>>>>>> >>>>>>> >>>>>>> Thanks a lot! >>>>>>> Jeanne >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>the log can be >>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>added. >>>>>>>> If there is a >>>>>> GFF3 >>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>only exist for >>>>the >>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>models. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Carson, >>>>>>>>> >>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>genome which is split >>>>>>>> into >>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>to >>>>>>>>>finish >>>>normally, >>>>>>>> while >>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>output. Do you have >>>>any >>>>>>>>> suggestion why this is happening? >>>>>>>>> >>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>found that of >>>>38.384 >>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>surprise >>>>>>>>>is, that 2/3 of >>>>>>>> these >>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>models for these >>>>>>>> 'finished' >>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>parts of the genome >>>>>>>> (i.e., >>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>'finished') >>>>>>>>> Should this be an issue of concern? >>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>NFS files look >>>>>> good, >>>>>>>> so >>>>>>>>> we suspect something fishy going on... >>>>>>>>> >>>>>>>>> Hope you can help, >>>>>>>>> best wishes, >>>>>>>>> Jeanne Wilbrandt >>>>>>>>> >>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> maker-devel mailing list >>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>> >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b. >>>>>>>>>org >>>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:55:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:55:15 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Which 2.31? Current is 2.31.6. --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:57:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:57:39 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: For the file you just sent me, is that from the first run with map_forward=0 or with map_forward=1? --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From j.wilbrandt at zfmk.de Thu Aug 14 09:53:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:53:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: It is version 2.31. My first try was done with map_forward=0, and (I just noticed) the duplicates are present in the separate gff3s already also in this case (one is attached). Has this something to do with the first-run-gff3 I fed it? On Thu, 14 Aug 2014 15:46:44 +0000 Carson Holt wrote: >What version of MAKER are you using? I'd also need to see the GFF3 files >before the merge. You may also need to turn off map_forward since you are >passing in GFF3 with MAKER names, creating new models with MAKER names and >then moving names from old models forward onto new ones (which may force >names to be used twice). > >--Carson > > >On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >> >>Thank you so much! >> >>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>merging' approach with >>a subset of scaffolds and got duplicate IDs. >> >>Here is what I did: >>- divided input scaffolds in two files >>- run maker separately on these files (-> separate output dirs) >>-- additional input: maker-generated gff3 from previous (singular) run >>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>-- map_forward=0 / 1 (I tried both, to the same effect) >>- gff3_merge two times using index-log >>- gff3_merge these two gff3 files >> >>$ >>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>uniq -c | sort -n >>| tail >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >> >>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>grep "\sgene" >>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >> >>- found duplicates! i.e. the same ID for gene annotations in different >>areas of the same >>scaffold (of 655 gene annotations, 51 appear twice) >>-- this happens not only with gene, but also CDS and mRNA annotations, as >>far as I can >>see (here, in one example, non-everlapping but close CDS snippets got the >>same ID). >> >> >>I suspected this might have to do with the map_forward flag, but I get >>the same problem >>again (with genes at the same locations). >>I attached one of the ctl files for you in case you want to have a look, >>the other is >>analogous. Do you need something else? >> >>What did I miss? This should not happen, right? >> >> >> >> >>On Wed, 13 Aug 2014 15:52:34 +0000 >> Carson Holt wrote: >>>Yes. One cpu will have several processes, most are helper processes that >>>will use 0% CPU almost all of the time (for example there is a shared >>>variable manager process that will launch with MAKER but will also be >>>called 'maker' under top because it is technically its child and not a >>>separate script). Also system calls will launch a new process that will >>>use all CPU while the process calling it will drop to 0% CPU until it >>>finishes. >>> >>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>GFF3 file. >>> >>>--Carson >>> >>> >>> >>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>> >>>> >>>>Our admin counts processes. Do I understand you right, that one CPU >>>>handles several >>>>processes? >>>> >>>>I'm still confused by the different directories (and I made a mistake >>>>when asking last >>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>directory...). >>>>So, if I start each piece of a genome in its own directory (for >>>>example), >>>>then it gets a >>>>unique basename (because the output will be separate from all other >>>>pieces anyway) and I >>>>will not run dsindex but instead use gff3_merge for each piece's output >>>>and then once >>>>again to merge all resulting gff3-files? >>>> >>>>Hope I got you right :) >>>> >>>>Thanks fopr your help! >>>>Jeanne >>>> >>>> >>>> >>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>> Carson Holt wrote: >>>>>Is your admin counting processes or cpu usage? Because each system >>>>>call >>>>>creates a >>>>>separate process, so you can expect multiple processes (each system >>>>>call >>>>>generates a new >>>>>process) but only a single cpu of usage per instance. Use different >>>>>directories if you >>>>>are running that many jobs. You can concatenate the separate results >>>>>when your done. >>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>generated from >>>>>separate jobs. >>>>> >>>>>--Carson >>>>> >>>>>Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>threads. Our admin >>>>>reports >>>>>> however, that the processes seem to assemble more threads than they >>>>>>are allowed. It is >>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>suggestion why? >>>>>> >>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>write to the same >>>>>> directory (as, I think is required to put the pieces together in the >>>>>>end?)? das >>>>>-basename >>>>>> take paths? >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>> Carson Holt wrote: >>>>>>> I think the freezing is because you are starting too many >>>>>>>simultaneous jobs. You >>>>>should >>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>doing things can >>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>same directory. You >>>>>>> could try splitting them into different directories. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>> >>>>>>>>wrote: >>>>>>>> >>>>>>>> >>>>>>>> aha, so this explains that. >>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>than 60,000, >>>>>roughly >>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>> >>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>doing >>>>>>> anything'-behavior? >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> Jeanne >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>> Carson Holt wrote: >>>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>>the log can be >>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>added. >>>>>>>>> If there is a >>>>>>> GFF3 >>>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>>only exist for >>>>>the >>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>models. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Carson, >>>>>>>>>> >>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>genome which is split >>>>>>>>> into >>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>to >>>>>>>>>>finish >>>>>normally, >>>>>>>>> while >>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>output. Do you have >>>>>any >>>>>>>>>> suggestion why this is happening? >>>>>>>>>> >>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>found that of >>>>>38.384 >>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>surprise >>>>>>>>>>is, that 2/3 of >>>>>>>>> these >>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>models for these >>>>>>>>> 'finished' >>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>parts of the genome >>>>>>>>> (i.e., >>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>>'finished') >>>>>>>>>> Should this be an issue of concern? >>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>>NFS files look >>>>>>> good, >>>>>>>>> so >>>>>>>>>> we suspect something fishy going on... >>>>>>>>>> >>>>>>>>>> Hope you can help, >>>>>>>>>> best wishes, >>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>> >>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>>b. >>>>>>>>>>org >>>>>> >>>> >>> >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: splitrun_problem_01_all.gff3 Type: application/octet-stream Size: 4967463 bytes Desc: not available URL: From daniel.standage at gmail.com Thu Aug 21 09:33:33 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:33:33 -0400 Subject: [maker-devel] tRNAscan GFF3 Message-ID: Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:35:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:35:16 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: It should be on the same strand. Which MAKER version are you using? --Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:33 AM To: Maker Mailing List Subject: [maker-devel] tRNAscan GFF3 Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu Aug 21 09:36:41 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:36:41 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it > looks like Maker reports the second exon on the opposite strand as the > first exon, the tRNA feature, and the gene feature? Am I reading this > correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, > I would suggest that the correct representation would be to have all > features on the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:49:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:49:36 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: I half way remember some tRNAscan bugs being fixed in several of the sub versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. Could you give it a try and see if it is still giving you the issue. I did a quick look through the archives and I think this was found and fixed --> https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-d evel/Z-kvf_V2ynU/vstSNjHgyJQJ Thanks, Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:36 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it looks > like Maker reports the second exon on the opposite strand as the first exon, > the tRNA feature, and the gene feature? Am I reading this correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, I > would suggest that the correct representation would be to have all features on > the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Tue Aug 19 03:19:08 2014 From: rens.holmer at wur.nl (rens holmer) Date: Tue, 19 Aug 2014 11:19:08 +0200 Subject: [maker-devel] Maker error mpiexec Message-ID: Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Timothy.Stitt at tgac.ac.uk Thu Aug 21 14:05:46 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 21 Aug 2014 20:05:46 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:17:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:17:22 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:21:19 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:21:19 -0600 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: You need to make sure the same version of MPI is used to compile and run MAKER. When installing MAKER make sure the mpi.h and mpicc indicated during configuration come from the same version of OpenMPI as the mpiexec command you are using now. Also for OpenMPI run the following command before setting up or launching MAKER --> export LD_PRELOAD=?/openmpi_location/lib/libmpi.so replace openmpi_location in the above command with the location of your OpenMPI. Setting LD_PRELOAD preload is required for OpenMPI to work correctly with shared libraries. Also you may need to add the following to your MPI command before running MAKER. --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 40 maker Thanks, Carson From: rens holmer Date: Tuesday, August 19, 2014 at 3:19 AM To: Subject: [maker-devel] Maker error mpiexec Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:27:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:27:14 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: FYI. If you use the -nolock flag, never start MAKER more than once in the same directory. The lack of file locks means MAKER won't detect the other active process and they can end up overwriting each others output. So do any parallelization via MPI instead. Thanks, Carson From: Carson Holt Date: Thursday, August 21, 2014 at 2:17 PM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Fri Aug 22 04:43:20 2014 From: rens.holmer at wur.nl (rens holmer) Date: Fri, 22 Aug 2014 12:43:20 +0200 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: Thank you! export LD_PRELOAD=?/openmpi_location/lib/libmpi.so mpiexec -mca btl ^openib -n 40 maker Those two tweaks did the trick! Sincerely, Rens Holmer On Thu, Aug 21, 2014 at 10:21 PM, Carson Holt wrote: > You need to make sure the same version of MPI is used to compile and run > MAKER. When installing MAKER make sure the mpi.h and mpicc indicated > during configuration come from the same version of OpenMPI as the mpiexec > command you are using now. > > Also for OpenMPI run the following command before setting up or launching > MAKER --> > export LD_PRELOAD=?/openmpi_location/lib/libmpi.so > > replace openmpi_location in the above command with the location of your > OpenMPI. > > Setting LD_PRELOAD preload is required for OpenMPI to work correctly with > shared libraries. > > > Also you may need to add the following to your MPI command before running > MAKER. > --> -mca btl ^openib > Example --> mpiexec -mca btl ^openib -n 40 maker > > Thanks, > Carson > > > > From: rens holmer > Date: Tuesday, August 19, 2014 at 3:19 AM > To: > Subject: [maker-devel] Maker error mpiexec > > Hi, > > I am trying to run maker using MPI, and I get an error I do not understand. > > Maker version: 2.13.6 > mpiexec version: mpiexec (OpenRTE) 1.6.5 > > When I run ./Build status it is reported that MPI is enabled. > > When I run mpiexec -n 40 maker I get the following errors: > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > opal_shmem_base_select failed > > --> Returned value -1 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > > > Etcetera etcetera. > > However: when I search for the files reported as missing I do find them, > and I don't believe they are from a different version of MPI? > > Am I using a wrong version of MPI? > > Any help would be appreciated, > > Sincerely, > > > Rens Holmer > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Aug 26 08:53:25 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 26 Aug 2014 14:53:25 +0000 Subject: [maker-devel] MAKER run error -with blast Message-ID: <1409064805543.27602@uga.edu> Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- total clusters:6 now processing 0 prepare section files Gathering GFF3 input into hits - chunk:18 prepare section files Gathering GFF3 input into hits - chunk:19 Removing file: /lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_sn1prot/correct_uniprot_sn1prot_gff/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/scaffold00001.7.end.section.holdover ERROR: No such file or directory at /lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_sn1prot/correct_uniprot_sn1prot_gff/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/scaffold00001.7.end.section.holdover at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4482. Process::MpiChunk::__ANON__() called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 408 eval {...} called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 407 Error::subs::try(CODE(0x12f4d0f8), HASH(0x129acca0)) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4491 Process::MpiChunk::retrieve("/lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_"...) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 3311 Process::MpiChunk::__ANON__() called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 415 eval {...} called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 407 Error::subs::try(CODE(0x1262ed50), HASH(0x12409548)) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4215 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x12eb0f70), "run", HASH(0x12958360), 12, 3) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x12eb0f70), 5) called at /usr/local/maker/latest/bin/maker line 979 --> rank=5, hostname=compute-6-5.local --> rank=5, hostname=compute-6-5.local ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold00001 -------------- next part -------------- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore To access files for individual sequences use the datastore index: /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold00001 Length: 11360811 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_8GDzH7; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.0.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_1sNfMC; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.6.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.1.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.2.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.5.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.3.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.4.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the reqWuested dUBlastSearcatabase to seahEngine:rch! :search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: RepeatMasker failed --> rank=7, hostname=compute-13-7.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=1, hostname=compute-9-15.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: RepeatMasker failed --> rank=3, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=2, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSWUBlastSearchEngiearchEngine::search: FAne::searcTAL: Thh: FATere is AL: Tnothinghere i in thes noth requesing inted dat the requested database abase to seato searchrch! ! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! ERROR: RepeatMasker failed --> rank=6, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=5, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=4, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 From carsonhh at gmail.com Tue Aug 26 09:03:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 09:03:28 -0600 Subject: [maker-devel] MAKER run error -with blast Message-ID: Make sure you are not setting TMP= in the maker_opts.ctl file to an NFS mounted location. Also check your /tmp directory to see if it is full or nearly full (it will be mounted on a different drive than your working directory). Also if it is being caused by slow NFS response you can set clean_try=1 and it will do complete retry on the contig rather than trying to recover partial files. --Carson From: Sivaranjani Namasivayam Date: Tuesday, August 26, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER run error -with blast Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Aug 26 09:55:40 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 26 Aug 2014 11:55:40 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I > believe and most 2.31 updates were related to tRNAscan). Current version > is 2.31.6. Could you give it a try and see if it is still giving you the > issue. > > I did a quick look through the archives and I think this was found and > fixed --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-devel/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the >> first exon, the tRNA feature, and the gene feature? Am I reading this >> correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, >> I would suggest that the correct representation would be to have all >> features on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 26 10:06:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 10:06:26 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Thanks. --Carson From: Daniel Standage Date: Tuesday, August 26, 2014 at 9:55 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe > and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. > Could you give it a try and see if it is still giving you the issue. > > I did a quick look through the archives and I think this was found and fixed > --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-dev > el/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the first >> exon, the tRNA feature, and the gene feature? Am I reading this correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, I >> would suggest that the correct representation would be to have all features >> on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Wed Aug 27 09:52:54 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Wed, 27 Aug 2014 15:52:54 +0000 Subject: [maker-devel] non-redundant fasta and gff Message-ID: Hi Is there a way to produce a fasta file and gff for a set of non-redundant genes predicted by the Maker software. Fasta-merge and gff-merge generate a file that has different prediction (e.g generated by Augustus, GeneMark etc. ) for the same gene sac as as individual genes. Regards Hossein From carsonhh at gmail.com Wed Aug 27 09:57:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:57:10 -0600 Subject: [maker-devel] non-redundant fasta and gff Message-ID: The fasta files created for augustus, snap, etc. are only for reference purposes. They are the raw ab initio prediction produced by these algorithms ran by themselves (they are match/match_part features in the GFF3 file). The file you want is the maker.transcripts.fasta and maker.proteins.fasta files. They contain the non-redundant final annotations. They are the same ones that are marked as gene/mRNA/exon/CDS features in the GFF3 file. --Carson On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: >Hi > > >Is there a way to produce a fasta file and gff for a set of non-redundant >genes predicted by the Maker software. Fasta-merge and gff-merge generate >a file that has different prediction (e.g generated by Augustus, >GeneMark etc. ) for the same gene sac as as individual genes. > > > >Regards > > >Hossein > > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 27 09:58:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:58:47 -0600 Subject: [maker-devel] non-redundant fasta and gff In-Reply-To: References: Message-ID: Please see the documentation wiki for explanations of how to read and use MAEKR's output. http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_ GMOD_Online_Training_2014#MAKER.27s_Output Thanks, Carson On 8/27/14, 9:57 AM, "Carson Holt" wrote: >The fasta files created for augustus, snap, etc. are only for reference >purposes. They are the raw ab initio prediction produced by these >algorithms ran by themselves (they are match/match_part features in the >GFF3 file). The file you want is the maker.transcripts.fasta and >maker.proteins.fasta files. They contain the non-redundant final >annotations. They are the same ones that are marked as gene/mRNA/exon/CDS >features in the GFF3 file. > >--Carson > > >On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: > >>Hi >> >> >>Is there a way to produce a fasta file and gff for a set of non-redundant >>genes predicted by the Maker software. Fasta-merge and gff-merge generate >>a file that has different prediction (e.g generated by Augustus, >>GeneMark etc. ) for the same gene sac as as individual genes. >> >> >> >>Regards >> >> >>Hossein >> >> >> >> >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Mon Aug 4 14:27:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 04 Aug 2014 14:27:08 -0600 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: Sorry for the slow reply. I was on vacation all last week. Do you have the full STDERR? sometimes the last error is irrelevant and it's just the result of a failure further upstream. Also are you running 20 independent maker jobs simultaneously? --Carson From: Jan Philip Oeyen Date: Monday, July 28, 2014 at 6:22 AM To: Subject: [maker-devel] Forks.pm error when running maker with dsindex Hi all, we are currently having some unexpected errors when running maker on a genome which is split in several parts. Our cluster admin reported the following error message: Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm line 2188. SIGTERM received SIGTERM received SIGTERM received We were using maker with the '-g' option on a single genome which is split into 20 parts, where 19 parts are equally large and the last contains about 20 sequences more. After that we ran Maker using dsindex to clean up the output. We are currently using maker v2.31 on 4 threads and forks v0.34. If any further info is needed to clarify the problem, please let me know and I will provide as much as possible. Thank you for your help! Best regards, Jan Philip Oeyen ZFMK // ZMB // University of Bonn _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevintsai at iis.sinica.edu.tw Tue Aug 5 04:59:45 2014 From: kevintsai at iis.sinica.edu.tw (Kevin Tsai) Date: Tue, 5 Aug 2014 18:59:45 +0800 Subject: [maker-devel] Early obstacle with SplitDB Message-ID: Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- *Kevin Tsai* www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:21:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:21:51 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:26:51 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:26:51 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai Date: Tuesday, August 5, 2014 at 4:59 AM To: Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 5 14:49:33 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 05 Aug 2014 14:49:33 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner , Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 6 01:03:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 06 Aug 2014 01:03:26 -0600 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: If it happening only with GFF3 pass-through, then it may be something I saw and fixed a while ago (there were some GFF3 passthrough fixes since 2.31.4). Could you check and see if it still happens in 2.31.6. Also if it is only the first or last CDS/exon, then Augustus can do that and it's not actually a bug. Basically it is truncating the model to the start/stop codon so the first or last exon/CDS may appear short, but it's really just incomplete. If you can find any example of a non-CDS/exon feature then could you send it to me? Thanks, Carson From: Marc H?ppner Date: Wednesday, July 30, 2014 at 4:44 AM To: Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_29 27-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Wed Aug 6 01:15:04 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 07:15:04 +0000 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Message-ID: Ok. I took a look and I'm relatively sure the issue you are seeing is caused by GFF3 passthrough combined with correct_est_fusion=1. This is something that only happens when both are used simultaneously and should be corrected in the current version of MAKER. Thanks, Carson From: Marc H?ppner > Date: Wednesday, August 6, 2014 at 12:14 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt > wrote: One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? --Carson From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM To: Marc H?ppner >, > Subject: Re: [maker-devel] Maker GFF output with features of 0 length Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? --Carson From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM To: > Subject: [maker-devel] Maker GFF output with features of 0 length Hi, I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. For example: scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 06:40:19 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 14:40:19 +0200 Subject: [maker-devel] Further split genome questions Message-ID: Hi Carson, I ran into more conspicuous behavior running maker 2.31 on a genome which is split into 20 parts, using the -g flag and the same basename. Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while the remaining three seemed to be stalled and produced 0B of output. Do you have any suggestion why this is happening? After I stopped these stalled jobs, I checked the index.log and found that of 38.384 mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these only appear as FINISHED (the rest only started). There are no models for these 'finished' scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., each of the 20 jobs contained scaffolds that 'did not start' but 'finished') Should this be an issue of concern? It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so we suspect something fishy going on... Hope you can help, best wishes, Jeanne Wilbrandt zmb // ZFMK // University of Bonn From carsonhh at gmail.com Wed Aug 6 08:16:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 08:16:52 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <780B8D9B-94FB-4282-9611-632C7CB532DC@gmail.com> If you are starting and restarting, or running multiple jobs then the log can be partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 result file for the contig, then it is FINISHED. FASTA files will only exist for the contigs that have gene models. Small contigs will rarely contain models. --Carson Sent from my iPhone > On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: > > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 6 08:18:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 6 Aug 2014 14:18:28 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <736D63C9-1393-4FFB-8553-262454C44BC1@genetics.utah.edu> Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent among those scaffolds? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.wilbrandt at zfmk.de Wed Aug 6 09:01:02 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:01:02 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: aha, so this explains that. Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly half of the sequences being shorter than 3,000 bp. What do you think about this weird 'I am running but not really doing anything'-behavior? Thanks a lot! Jeanne On Wed, 6 Aug 2014 14:16:52 +0000 Carson Holt wrote: >If you are starting and restarting, or running multiple jobs then the log can be >partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >result file for the contig, then it is FINISHED. FASTA files will only exist for the >contigs that have gene models. Small contigs will rarely contain models. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >> >> >> Hi Carson, >> >> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >into >> 20 parts, using the -g flag and the same basename. >> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >while >> the remaining three seemed to be stalled and produced 0B of output. Do you have any >> suggestion why this is happening? >> >> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >these >> only appear as FINISHED (the rest only started). There are no models for these >'finished' >> scaffolds stored in the .db and they are distributed over all parts of the genome >(i.e., >> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >> Should this be an issue of concern? >> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >so >> we suspect something fishy going on... >> >> Hope you can help, >> best wishes, >> Jeanne Wilbrandt >> >> zmb // ZFMK // University of Bonn >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 6 09:12:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:12:50 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: <5C8B509A-7093-4626-92CE-6D09B570887C@gmail.com> I think the freezing is because you are starting too many simultaneous jobs. You should try and use MPI to parallelize instead. The concurrent job way of doing things can start to cause problems If you are running 10 or more jobs in the same directory. You could try splitting them into different directories. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: > > > aha, so this explains that. > Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly > half of the sequences being shorter than 3,000 bp. > > What do you think about this weird 'I am running but not really doing anything'-behavior? > > > Thanks a lot! > Jeanne > > > > On Wed, 6 Aug 2014 14:16:52 +0000 > Carson Holt wrote: >> If you are starting and restarting, or running multiple jobs then the log can be >> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a GFF3 >> result file for the contig, then it is FINISHED. FASTA files will only exist for the >> contigs that have gene models. Small contigs will rarely contain models. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> Hi Carson, >>> >>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >> into >>> 20 parts, using the -g flag and the same basename. >>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >> while >>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>> suggestion why this is happening? >>> >>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >> these >>> only appear as FINISHED (the rest only started). There are no models for these >> 'finished' >>> scaffolds stored in the .db and they are distributed over all parts of the genome >> (i.e., >>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>> Should this be an issue of concern? >>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, >> so >>> we suspect something fishy going on... >>> >>> Hope you can help, >>> best wishes, >>> Jeanne Wilbrandt >>> >>> zmb // ZFMK // University of Bonn >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From j.wilbrandt at zfmk.de Wed Aug 6 09:33:07 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 17:33:07 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports however, that the processes seem to assemble more threads than they are allowed. It is not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? If I start the jobs in the same directory, how can I make sure they write to the same directory (as, I think is required to put the pieces together in the end?)? das -basename take paths? On Wed, 6 Aug 2014 15:12:50 +0000 Carson Holt wrote: >I think the freezing is because you are starting too many simultaneous jobs. You should >try and use MPI to parallelize instead. The concurrent job way of doing things can >start to cause problems If you are running 10 or more jobs in the same directory. You >could try splitting them into different directories. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >> >> >> aha, so this explains that. >> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >> half of the sequences being shorter than 3,000 bp. >> >> What do you think about this weird 'I am running but not really doing >anything'-behavior? >> >> >> Thanks a lot! >> Jeanne >> >> >> >> On Wed, 6 Aug 2014 14:16:52 +0000 >> Carson Holt wrote: >>> If you are starting and restarting, or running multiple jobs then the log can be >>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >GFF3 >>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>> contigs that have gene models. Small contigs will rarely contain models. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> Hi Carson, >>>> >>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>> into >>>> 20 parts, using the -g flag and the same basename. >>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>> while >>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>> suggestion why this is happening? >>>> >>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>> these >>>> only appear as FINISHED (the rest only started). There are no models for these >>> 'finished' >>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>> (i.e., >>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>> Should this be an issue of concern? >>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >good, >>> so >>>> we suspect something fishy going on... >>>> >>>> Hope you can help, >>>> best wishes, >>>> Jeanne Wilbrandt >>>> >>>> zmb // ZFMK // University of Bonn >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From carsonhh at gmail.com Wed Aug 6 09:45:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 6 Aug 2014 09:45:56 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: <28DF9A41-8E59-4104-87A6-CD7CD9F436D8@gmail.com> Is your admin counting processes or cpu usage? Because each system call creates a separate process, so you can expect multiple processes (each system call generates a new process) but only a single cpu of usage per instance. Use different directories if you are running that many jobs. You can concatenate the separate results when your done. Use gff3_merge script to help concatenate the separate GFF3 files generated from separate jobs. --Carson Sent from my iPhone > On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: > > > > We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin reports > however, that the processes seem to assemble more threads than they are allowed. It is > not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? > > If I start the jobs in the same directory, how can I make sure they write to the same > directory (as, I think is required to put the pieces together in the end?)? das -basename > take paths? > > > On Wed, 6 Aug 2014 15:12:50 +0000 > Carson Holt wrote: >> I think the freezing is because you are starting too many simultaneous jobs. You should >> try and use MPI to parallelize instead. The concurrent job way of doing things can >> start to cause problems If you are running 10 or more jobs in the same directory. You >> could try splitting them into different directories. >> >> --Carson >> >> Sent from my iPhone >> >>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>> >>> >>> aha, so this explains that. >>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, roughly >>> half of the sequences being shorter than 3,000 bp. >>> >>> What do you think about this weird 'I am running but not really doing >> anything'-behavior? >>> >>> >>> Thanks a lot! >>> Jeanne >>> >>> >>> >>> On Wed, 6 Aug 2014 14:16:52 +0000 >>> Carson Holt wrote: >>>> If you are starting and restarting, or running multiple jobs then the log can be >>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >> GFF3 >>>> result file for the contig, then it is FINISHED. FASTA files will only exist for the >>>> contigs that have gene models. Small contigs will rarely contain models. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>> >>>>> >>>>> Hi Carson, >>>>> >>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>> into >>>>> 20 parts, using the -g flag and the same basename. >>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, >>>> while >>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have any >>>>> suggestion why this is happening? >>>>> >>>>> After I stopped these stalled jobs, I checked the index.log and found that of 38.384 >>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>> these >>>>> only appear as FINISHED (the rest only started). There are no models for these >>>> 'finished' >>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>> (i.e., >>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>> Should this be an issue of concern? >>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >> good, >>>> so >>>>> we suspect something fishy going on... >>>>> >>>>> Hope you can help, >>>>> best wishes, >>>>> Jeanne Wilbrandt >>>>> >>>>> zmb // ZFMK // University of Bonn >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carson.holt at genetics.utah.edu Wed Aug 6 11:18:22 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 6 Aug 2014 17:18:22 +0000 Subject: [maker-devel] Forks.pm error when running maker with dsindex In-Reply-To: References: Message-ID: It's better to run fewer jobs with more cpus given to MPI rather than many jobs with few cpus (i.e. mpiexec -n 4). To correct errors, you just restart MAKER. No need to set the -a flag unless you want to rerun everything, and not just the failed contigs. --Carson On 8/6/14, 3:03 AM, "Jeanne Wilbrandt" wrote: > >Hi! > >Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our >cluster can >take. Do you think this is too much? > >Please find attached the output file (containing the STDERR) of the >dsindex-run, and one >example output of one of the pieces. > >Another quick question to make sure I understood the guides correctly: If >a job did not >finish properly, it should suffice to restart the same thing just with >the -a flag and it >should clean up and finish what it was supposed to, right? (i.e., it's >not necessary to >trace and delete the unfinished output manually?) > >Thank you again! >Jeanne Wilbrandt > >zmb // ZFMK // University of Bonn > > > >On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: >> >> >> 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Mon, 04 Aug 2014 14:27:08 -0600 >> From: Carson Holt >> To: Jan Philip Oeyen , >> >> Subject: Re: [maker-devel] Forks.pm error when running maker with >> dsindex >> Message-ID: >> Content-Type: text/plain; charset="utf-8" >> >> Sorry for the slow reply. I was on vacation all last week. Do you >>have the >> full STDERR? sometimes the last error is irrelevant and it's just the >>result >> of a failure further upstream. Also are you running 20 independent maker >> jobs simultaneously? >> >> --Carson >> >> >> From: Jan Philip Oeyen >> Date: Monday, July 28, 2014 at 6:22 AM >> To: >> Subject: [maker-devel] Forks.pm error when running maker with dsindex >> >> Hi all, >> we are currently having some unexpected errors when running maker on a >> genome which is split in several parts. Our cluster admin reported the >> following error message: >> >> Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu >> les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm >> line 2188. >> SIGTERM received >> SIGTERM received >> SIGTERM received >> >> We were using maker with the '-g' option on a single genome which is >>split >> into 20 parts, where 19 parts are equally large and the last contains >>about >> 20 sequences more. After that we ran Maker using dsindex to clean up the >> output. We are currently using maker v2.31 on 4 threads and forks v0.34. >> >> If any further info is needed to clarify the problem, please let me >>know and >> I will provide as much as possible. >> >> Thank you for your help! >> >> Best regards, >> Jan Philip Oeyen >> ZFMK // ZMB // University of Bonn >> From mphoeppner at gmail.com Wed Aug 6 00:14:23 2014 From: mphoeppner at gmail.com (=?iso-8859-1?Q?Marc_H=F6ppner?=) Date: Wed, 6 Aug 2014 08:14:23 +0200 Subject: [maker-devel] Maker GFF output with features of 0 length In-Reply-To: References: <5C45F418-018B-4ACC-B682-E5659DB7F102@gmail.com> Message-ID: <7D68D5F6-718A-4B7F-8940-59DBA64FFBBD@gmail.com> Hi, I suspect that Augustus plays a role, since the affected features are seeded by augustus (based on the name anyway). What I found was that this seems to only happen when using pre-aligned (i.e. GFF3-formatted) cdna2genome and protein2genome evidence (created by Maker in a previous run). And this seems to be quit reproducible - and doesn?t only affect CDS features. I have put the Maker output for a test scaffold here: https://dl.dropboxusercontent.com/u/1918141/maker_output.tar.bz2 The problematic lines: scaffold_563 maker five_prime_UTR 38501 38501 . - . ID=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1:five_prime_utr;Parent=augustus_masked-scaffold_563-processed-gene-0.14-mRNA-1 scaffold_563 maker exon 69967 69967 . - . ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:exon:148;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 scaffold_563 maker CDS 69967 69967 . - 1 ID=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1:cds;Parent=augustus_masked-scaffold_563-processed-gene-0.18-mRNA-1 Strange stuff? Regards, Marc On 05 Aug 2014, at 22:49, Carson Holt wrote: > One more thing. From the example you gave, is is important to note that the terminal CDS (first or last) can be a single base pair in length (start and end will be the same value). Augustus sometimes does this for example. Do you have non-CDS feature types where this happens, or any internal CDS's where this happens? > > --Carson > > > From: Carson Holt > Date: Tuesday, August 5, 2014 at 2:21 PM > To: Marc H?ppner , > Subject: Re: [maker-devel] Maker GFF output with features of 0 length > > Were you using GFF3 pass-through or correct_est_fusion options? When you rerun do the same features still have lengths of zero (I.e. is it random or is it reproducable)? > > --Carson > > > From: Marc H?ppner > Date: Wednesday, July 30, 2014 at 4:44 AM > To: > Subject: [maker-devel] Maker GFF output with features of 0 length > > Hi, > > I?ve - more by accident - found that many of the gene builds I have generated with Maker (2.31.3) contain features with identical start and stop positions. > > For example: > > scaffold_2927 maker CDS 13013 13013 . + 1 ID=maker-scaffold_2927-augustus-gene-0.8-mRNA-1:cds;Parent=maker-scaffold_2927-augustus-gene-0.8-mRNA-1 > > > This occurs seemingly randomly for all sorts of feature types and I have only seen this when running Maker on full assemblies. Before I start turning every stone, any ideas about possible explanations for this phenomenon? Is this likely some MPI-related communication issue, or NFS problems with synching data? Maker runs fine on our system, but that doesn?t mean that there aren?t any cryptic issues that only on these occasions read their head? Regarding the frequency, out of 450.000 GFF lines, 270 were affected in the case that I looked into the most. So it is pretty rare, but still... > > I am currently using Maker with openmpi-1.7.4 and the file system is mounter of NFS4 and IPoIB. I now switched to Maker 2.31.6, but have no strong reason to suspect that this will make a difference. > > Regards, > > Marc > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Wed Aug 6 03:03:28 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 06 Aug 2014 11:03:28 +0200 Subject: [maker-devel] Forks.pm error when running maker with dsindex Message-ID: Hi! Yes, we are running 20 jobs simultaneously, almost, i.e., as much as our cluster can take. Do you think this is too much? Please find attached the output file (containing the STDERR) of the dsindex-run, and one example output of one of the pieces. Another quick question to make sure I understood the guides correctly: If a job did not finish properly, it should suffice to restart the same thing just with the -a flag and it should clean up and finish what it was supposed to, right? (i.e., it's not necessary to trace and delete the unfinished output manually?) Thank you again! Jeanne Wilbrandt zmb // ZFMK // University of Bonn On 08/05/2014 08:00 PM, maker-devel-request at yandell-lab.org wrote: > > > 1. Re: Forks.pm error when running maker with dsindex (Carson Holt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 04 Aug 2014 14:27:08 -0600 > From: Carson Holt > To: Jan Philip Oeyen , > > Subject: Re: [maker-devel] Forks.pm error when running maker with > dsindex > Message-ID: > Content-Type: text/plain; charset="utf-8" > > Sorry for the slow reply. I was on vacation all last week. Do you have the > full STDERR? sometimes the last error is irrelevant and it's just the result > of a failure further upstream. Also are you running 20 independent maker > jobs simultaneously? > > --Carson > > > From: Jan Philip Oeyen > Date: Monday, July 28, 2014 at 6:22 AM > To: > Subject: [maker-devel] Forks.pm error when running maker with dsindex > > Hi all, > we are currently having some unexpected errors when running maker on a > genome which is split in several parts. Our cluster admin reported the > following error message: > > Argument "ALRM" isn't numeric in exit at /share/scientific_bin/perlmodu > les/lib/site_perl/5.14.2/x86_64-linux-thread-multi/forks.pm > line 2188. > SIGTERM received > SIGTERM received > SIGTERM received > > We were using maker with the '-g' option on a single genome which is split > into 20 parts, where 19 parts are equally large and the last contains about > 20 sequences more. After that we ran Maker using dsindex to clean up the > output. We are currently using maker v2.31 on 4 threads and forks v0.34. > > If any further info is needed to clarify the problem, please let me know and > I will provide as much as possible. > > Thank you for your help! > > Best regards, > Jan Philip Oeyen > ZFMK // ZMB // University of Bonn > -------------- next part -------------- A non-text attachment was scrubbed... Name: split_index.o2510 Type: application/octet-stream Size: 1641 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_04.o2490 Type: application/octet-stream Size: 8883704 bytes Desc: not available URL: From dandence at gmail.com Wed Aug 6 07:50:43 2014 From: dandence at gmail.com (Daniel Ence) Date: Wed, 6 Aug 2014 07:50:43 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: Message-ID: Hi Jeanne, what?s the average length of those 154 scaffolds that only appeared once in the log? Is the length pretty consistent? ~Daniel On Aug 6, 2014, at 6:40 AM, Jeanne Wilbrandt wrote: > > Hi Carson, > > I ran into more conspicuous behavior running maker 2.31 on a genome which is split into > 20 parts, using the -g flag and the same basename. > Most of the jobs ran simultaneously on the same node, 17 seemed to finish normally, while > the remaining three seemed to be stalled and produced 0B of output. Do you have any > suggestion why this is happening? > > After I stopped these stalled jobs, I checked the index.log and found that of 38.384 > mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of these > only appear as FINISHED (the rest only started). There are no models for these 'finished' > scaffolds stored in the .db and they are distributed over all parts of the genome (i.e., > each of the 20 jobs contained scaffolds that 'did not start' but 'finished') > Should this be an issue of concern? > It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look good, so > we suspect something fishy going on... > > Hope you can help, > best wishes, > Jeanne Wilbrandt > > zmb // ZFMK // University of Bonn > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Aug 11 10:11:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 11 Aug 2014 10:11:28 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson From: Kevin Tsai Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: > Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted > directory (must be locally mounted), the drive containing directory specified > by TMP= (defaults to /tmp) is full or nearly full, your input file is not > proper fasta format, or you are using an out of date version of BioPerl. > > Try the first three in the list then look at BioPerl. The BioPerl version > should be printed as part of the the debug output. > > --Carson > > > From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM > To: > Subject: [maker-devel] Early obstacle with SplitDB > > Hello, > I'm a new user to Maker so I suspect this will be a simple question, but I am > having trouble finding documentation on SplitDB. Our IT admin set up the > application and I'm running into the following issue about 30 seconds after > kickoff. Below is the debugged output: > > STATUS: Parsing control files... > Calling GI::load_control_files at /usr/bin/maker line 452. > Calling GI::new_instance_temp at /usr/bin/maker line 463. > Calling GI::mount_check at /usr/bin/maker line 465. > Calling GI::set_global_temp at /usr/bin/maker line 483. > STATUS: Processing and indexing input FASTA files... > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling GI::s_abs_path at /usr/bin/maker line 519. > Calling List::Util::shuffle at /usr/bin/maker line 529. > Calling GI::split_db at /usr/bin/maker line 536. > Calling File::Path::rmtree at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling Iterator::Any::nextDef at /usr/bin/maker line 537. > Calling Iterator::Any::new at /usr/bin/maker line 537. > Calling mkdir at /usr/bin/maker line 537. > Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. > Calling system at /usr/bin/maker line 537. > ERROR: SplitDB not created correctly > > at /usr/local/share/perl5/GI.pm line 1144. > GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, > "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at > /usr/bin/maker line 537 > --> rank=NA, hostname=Za2.cglab > > Any suggestions? Thank you in advance! > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.priyam at qmul.ac.uk Wed Aug 13 03:30:39 2014 From: a.priyam at qmul.ac.uk (Anurag Priyam) Date: Wed, 13 Aug 2014 15:00:39 +0530 Subject: [maker-devel] does MAKER modify input FASTA Message-ID: Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam From j.wilbrandt at zfmk.de Wed Aug 13 03:32:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Wed, 13 Aug 2014 11:32:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> From dence at genetics.utah.edu Wed Aug 13 09:29:41 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:29:41 +0000 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: Hi Priyam, After MAKER has completed it's run and you've merged the results with gff3_merge, you can see the original fasta genome in the resulting gff3 file, below the ##FASTA pragma. For each scaffold in your genome, the masked fasta can be found in it's individual directory in the master_datastore that MAKER created to keep track of results. I'm pretty sure this will only be 'soft-masked' (lower-case letters) and not hard-masked ('N' characters). Let me know whether this helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Anurag Priyam [a.priyam at qmul.ac.uk] Sent: Wednesday, August 13, 2014 3:30 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] does MAKER modify input FASTA Is it possible that the input FASTA file (containing the genome that is being annotated) and the FASTA sequences in the output GFF file (containing the resulting annotations + the genome) be different? -> It's fine if the ordering of the scaffolds, or width (for pretty formatting) are different. -> But, will MAKER add 'NNN' or change the case to indicate masking? It doesn't seem so to me, but I have only one test set, so can't be sure. -> Is it possible to get masked genome out from MAKER? -- Priyam _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:46:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:46:27 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: The output fasta will be letter for letter identical to the input fasta and will be all uppercase. Only if your input fasta contains unrecognized characters (for example 'Y' in the middle of the nucleotide sequence) and you use the --fix_nucleotides flag will those unrecognized characters be changed to 'N'. The masked fasta can be pulled out of theVoid directory if you really need it. It will be called query_masked.fasta. --Carson On 8/13/14, 3:30 AM, "Anurag Priyam" wrote: >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Wed Aug 13 09:46:59 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 13 Aug 2014 15:46:59 +0000 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de>, Message-ID: Hi Jeanne, I believe that's right. You can pass gff3_merge either a list of gff3 files or a maker-created datastore index file. To compile the pieces for each of your different runs you would give gff3_merge the datastore index file. To put those resulting gff3 files together, you would pass gff3_merge the list of gff3 files that you want to merge. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Jeanne Wilbrandt [j.wilbrandt at zfmk.de] Sent: Wednesday, August 13, 2014 3:32 AM To: Carson Holt; Wilbrandt Jeanne Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Further split genome questions Our admin counts processes. Do I understand you right, that one CPU handles several processes? I'm still confused by the different directories (and I made a mistake when asking last time, I wanted to say 'If I do NOT start the jobs in the same directory...). So, if I start each piece of a genome in its own directory (for example), then it gets a unique basename (because the output will be separate from all other pieces anyway) and I will not run dsindex but instead use gff3_merge for each piece's output and then once again to merge all resulting gff3-files? Hope I got you right :) Thanks fopr your help! Jeanne On Wed, 6 Aug 2014 15:45:56 +0000 Carson Holt wrote: >Is your admin counting processes or cpu usage? Because each system call creates a >separate process, so you can expect multiple processes (each system call generates a new >process) but only a single cpu of usage per instance. Use different directories if you >are running that many jobs. You can concatenate the separate results when your done. > Use gff3_merge script to help concatenate the separate GFF3 files generated from >separate jobs. > >--Carson > >Sent from my iPhone > >> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" wrote: >> >> >> >> We are using MPI as well, each of the 20 parts gets assigned 4 threads. Our admin >reports >> however, that the processes seem to assemble more threads than they are allowed. It is >> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a suggestion why? >> >> If I start the jobs in the same directory, how can I make sure they write to the same >> directory (as, I think is required to put the pieces together in the end?)? das >-basename >> take paths? >> >> >> On Wed, 6 Aug 2014 15:12:50 +0000 >> Carson Holt wrote: >>> I think the freezing is because you are starting too many simultaneous jobs. You >should >>> try and use MPI to parallelize instead. The concurrent job way of doing things can >>> start to cause problems If you are running 10 or more jobs in the same directory. You >>> could try splitting them into different directories. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" wrote: >>>> >>>> >>>> aha, so this explains that. >>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more than 60,000, >roughly >>>> half of the sequences being shorter than 3,000 bp. >>>> >>>> What do you think about this weird 'I am running but not really doing >>> anything'-behavior? >>>> >>>> >>>> Thanks a lot! >>>> Jeanne >>>> >>>> >>>> >>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>> Carson Holt wrote: >>>>> If you are starting and restarting, or running multiple jobs then the log can be >>>>> partially rebuilt. On rebuild only the FINISHED entries are added. If there is a >>> GFF3 >>>>> result file for the contig, then it is FINISHED. FASTA files will only exist for >the >>>>> contigs that have gene models. Small contigs will rarely contain models. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" wrote: >>>>>> >>>>>> >>>>>> Hi Carson, >>>>>> >>>>>> I ran into more conspicuous behavior running maker 2.31 on a genome which is split >>>>> into >>>>>> 20 parts, using the -g flag and the same basename. >>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to finish >normally, >>>>> while >>>>>> the remaining three seemed to be stalled and produced 0B of output. Do you have >any >>>>>> suggestion why this is happening? >>>>>> >>>>>> After I stopped these stalled jobs, I checked the index.log and found that of >38.384 >>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise is, that 2/3 of >>>>> these >>>>>> only appear as FINISHED (the rest only started). There are no models for these >>>>> 'finished' >>>>>> scaffolds stored in the .db and they are distributed over all parts of the genome >>>>> (i.e., >>>>>> each of the 20 jobs contained scaffolds that 'did not start' but 'finished') >>>>>> Should this be an issue of concern? >>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the NFS files look >>> good, >>>>> so >>>>>> we suspect something fishy going on... >>>>>> >>>>>> Hope you can help, >>>>>> best wishes, >>>>>> Jeanne Wilbrandt >>>>>> >>>>>> zmb // ZFMK // University of Bonn >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:47:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:47:15 -0600 Subject: [maker-devel] does MAKER modify input FASTA In-Reply-To: References: Message-ID: It will actually be a mixture of hard and soft masking depending on the class of repeat. --Carson On 8/13/14, 9:29 AM, "Daniel Ence" wrote: >Hi Priyam, > >After MAKER has completed it's run and you've merged the results with >gff3_merge, you can see the original fasta genome in the resulting gff3 >file, below the ##FASTA pragma. > >For each scaffold in your genome, the masked fasta can be found in it's >individual directory in the master_datastore that MAKER created to keep >track of results. I'm pretty sure this will only be 'soft-masked' >(lower-case letters) and not hard-masked ('N' characters). > >Let me know whether this helps, >Daniel > > >Daniel Ence >Graduate Student >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >________________________________________ >From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >Anurag Priyam [a.priyam at qmul.ac.uk] >Sent: Wednesday, August 13, 2014 3:30 AM >To: maker-devel at yandell-lab.org >Subject: [maker-devel] does MAKER modify input FASTA > >Is it possible that the input FASTA file (containing the genome that >is being annotated) and the FASTA sequences in the output GFF file >(containing the resulting annotations + the genome) be different? > >-> It's fine if the ordering of the scaffolds, or width (for pretty >formatting) are different. >-> But, will MAKER add 'NNN' or change the case to indicate masking? >It doesn't seem so to me, but I have only one test set, so can't be >sure. >-> Is it possible to get masked genome out from MAKER? > >-- Priyam > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 13 09:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 09:52:34 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Yes. One cpu will have several processes, most are helper processes that will use 0% CPU almost all of the time (for example there is a shared variable manager process that will launch with MAKER but will also be called 'maker' under top because it is technically its child and not a separate script). Also system calls will launch a new process that will use all CPU while the process calling it will drop to 0% CPU until it finishes. Yes. Your explanation is correct. You then use gff3_merge to merge the GFF3 file. --Carson On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >Our admin counts processes. Do I understand you right, that one CPU >handles several >processes? > >I'm still confused by the different directories (and I made a mistake >when asking last >time, I wanted to say 'If I do NOT start the jobs in the same >directory...). >So, if I start each piece of a genome in its own directory (for example), >then it gets a >unique basename (because the output will be separate from all other >pieces anyway) and I >will not run dsindex but instead use gff3_merge for each piece's output >and then once >again to merge all resulting gff3-files? > >Hope I got you right :) > >Thanks fopr your help! >Jeanne > > > >On Wed, 6 Aug 2014 15:45:56 +0000 > Carson Holt wrote: >>Is your admin counting processes or cpu usage? Because each system call >>creates a >>separate process, so you can expect multiple processes (each system call >>generates a new >>process) but only a single cpu of usage per instance. Use different >>directories if you >>are running that many jobs. You can concatenate the separate results >>when your done. >> Use gff3_merge script to help concatenate the separate GFF3 files >>generated from >>separate jobs. >> >>--Carson >> >>Sent from my iPhone >> >>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>wrote: >>> >>> >>> >>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>threads. Our admin >>reports >>> however, that the processes seem to assemble more threads than they >>>are allowed. It is >>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>suggestion why? >>> >>> If I start the jobs in the same directory, how can I make sure they >>>write to the same >>> directory (as, I think is required to put the pieces together in the >>>end?)? das >>-basename >>> take paths? >>> >>> >>> On Wed, 6 Aug 2014 15:12:50 +0000 >>> Carson Holt wrote: >>>> I think the freezing is because you are starting too many >>>>simultaneous jobs. You >>should >>>> try and use MPI to parallelize instead. The concurrent job way of >>>>doing things can >>>> start to cause problems If you are running 10 or more jobs in the >>>>same directory. You >>>> could try splitting them into different directories. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> aha, so this explains that. >>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>than 60,000, >>roughly >>>>> half of the sequences being shorter than 3,000 bp. >>>>> >>>>> What do you think about this weird 'I am running but not really doing >>>> anything'-behavior? >>>>> >>>>> >>>>> Thanks a lot! >>>>> Jeanne >>>>> >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>> Carson Holt wrote: >>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>the log can be >>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>> If there is a >>>> GFF3 >>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>only exist for >>the >>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>models. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Carson, >>>>>>> >>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>genome which is split >>>>>> into >>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>finish >>normally, >>>>>> while >>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>output. Do you have >>any >>>>>>> suggestion why this is happening? >>>>>>> >>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>found that of >>38.384 >>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>is, that 2/3 of >>>>>> these >>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>models for these >>>>>> 'finished' >>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>parts of the genome >>>>>> (i.e., >>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>'finished') >>>>>>> Should this be an issue of concern? >>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>NFS files look >>>> good, >>>>>> so >>>>>>> we suspect something fishy going on... >>>>>>> >>>>>>> Hope you can help, >>>>>>> best wishes, >>>>>>> Jeanne Wilbrandt >>>>>>> >>>>>>> zmb // ZFMK // University of Bonn >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>org >>> > From cjfields at illinois.edu Wed Aug 13 11:14:56 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 13 Aug 2014 17:14:56 +0000 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: On Aug 11, 2014, at 11:11 AM, Carson Holt > wrote: If you are updating every month to BioPerl live, don't. You should use the CPAN version of BioPerl or even the stable download. BioPerl live has actually broken several components MAKER uses at different times and depending on which version you currently have, may be broken now. Could you send me the Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. Also could you send me this file --> /home/keceltes/maker2/final.fasta The point of failure is actually very simple. At that point in the code, MAKER opens a file, reads it in one line at a time, writes it out to a new file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives because it uses Berkley DB). For that reason whenever it fails at that point, it is either a drive space issue, NFS issue, BioPerl issue, or file format issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. Also are you running via MPI? I ask because if you are using multiple nodes you will have to check the sixe of /tmp independently on each node (since the values will be different). Thanks, Carson chris From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB Hi Carson, Thanks for the suggestions. I left the TMP= empty, which as you mentioned defaults to /tmp. There seems to be a different error when using an NFS mounted directory (as I manually verified). My /tmp is also not full or nearly full, I have verified proper fasta formatting as I have run the fasta file through other statistics generating tools (i.e. Quast). We are also update BioPerl monthly. Do you think it could be anything else? Do you think any more information that I might be able to provide will be more insightful? On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt > wrote: Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted directory (must be locally mounted), the drive containing directory specified by TMP= (defaults to /tmp) is full or nearly full, your input file is not proper fasta format, or you are using an out of date version of BioPerl. Try the first three in the list then look at BioPerl. The BioPerl version should be printed as part of the the debug output. --Carson From: Kevin Tsai > Date: Tuesday, August 5, 2014 at 4:59 AM To: > Subject: [maker-devel] Early obstacle with SplitDB Hello, I'm a new user to Maker so I suspect this will be a simple question, but I am having trouble finding documentation on SplitDB. Our IT admin set up the application and I'm running into the following issue about 30 seconds after kickoff. Below is the debugged output: STATUS: Parsing control files... Calling GI::load_control_files at /usr/bin/maker line 452. Calling GI::new_instance_temp at /usr/bin/maker line 463. Calling GI::mount_check at /usr/bin/maker line 465. Calling GI::set_global_temp at /usr/bin/maker line 483. STATUS: Processing and indexing input FASTA files... Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling GI::s_abs_path at /usr/bin/maker line 519. Calling List::Util::shuffle at /usr/bin/maker line 529. Calling GI::split_db at /usr/bin/maker line 536. Calling File::Path::rmtree at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling Iterator::Any::nextDef at /usr/bin/maker line 537. Calling Iterator::Any::new at /usr/bin/maker line 537. Calling mkdir at /usr/bin/maker line 537. Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. Calling system at /usr/bin/maker line 537. ERROR: SplitDB not created correctly at /usr/local/share/perl5/GI.pm line 1144. GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at /usr/bin/maker line 537 --> rank=NA, hostname=Za2.cglab Any suggestions? Thank you in advance! -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Kevin Tsai www.linkedin.com/in/kevinjtsai/ Ph.D. Candidate, Bioinformatics Institute of Information Science, Academia Sinica _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Aug 13 12:19:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 13 Aug 2014 12:19:50 -0600 Subject: [maker-devel] Early obstacle with SplitDB In-Reply-To: References: Message-ID: The Berkley_DB/NFS issues happen more often for large index files or NFS systems with a slow response. Such issues also happen almost exclusively during index creation. There is a way you can tell MAKER to have BioPerl use something other than Berkley DB for indexing if you suspect that's the issue. You can give it a flag during the initial MAKER setup and installation. #use GDBM library cd .../maker/src perl Build.PL --AnyDBM_ISA GDBM_File ./Build install #use SDBM files cd .../maker/src perl Build.PL --AnyDBM_ISA SDBM_File ./Build install #use Berkley DB (default) cd .../maker/src perl Build.PL --AnyDBM_ISA DB_File ./Build install However, I find that the alternatives to Berkley DB can be more flakey. Also make sure /tmp is not tmpfs (which it may be on some systems). I've also seen weird behavior trying to index files on tmpfs storage on some systems. Thanks, Carson From: "Fields, Christopher J" Date: Wednesday, August 13, 2014 at 11:14 AM To: Carson Holt Cc: Kevin Tsai , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Early obstacle with SplitDB On Aug 11, 2014, at 11:11 AM, Carson Holt wrote: > If you are updating every month to BioPerl live, don't. You should use the > CPAN version of BioPerl or even the stable download. BioPerl live has > actually broken several components MAKER uses at different times and depending > on which version you currently have, may be broken now. Could you send me the > Bio::Root::Version line from the initial debug output? Exactly. Just a note, but the CPAN releases (now at 1.6.924) merge over all changes from the master branch on a regular basis. The key parts that will not work when running off master (such as Bio::Root, Bio::FeatureIO, etc) have been split out into separate repos; it?s entirely possible to add these separately to a PERL5LIB but the intent is that we will release Bio-Root and others to CPAN separately. > Also could you send me this file --> /home/keceltes/maker2/final.fasta > > The point of failure is actually very simple. At that point in the code, > MAKER opens a file, reads it in one line at a time, writes it out to a new > file, and then indexes it with BioPerl (the BioPerl won't work with NFS drives > because it uses Berkley DB). For that reason whenever it fails at that point, > it is either a drive space issue, NFS issue, BioPerl issue, or file format > issue. Re: Berkeley_DB, if you have a need to push this in a more NFS-portable direction we are more than happy to let you experiment on what works best. Mark Jensen actually started on this a while back but ran into problems. I personally haven?t had problems with Bio::DB::Fasta on our local GPFS to be frank, but I?m sure that isn?t working for everyone. > Also are you running via MPI? I ask because if you are using multiple nodes > you will have to check the sixe of /tmp independently on each node (since the > values will be different). > > Thanks, > Carson chris > From: Kevin Tsai > Date: Monday, August 11, 2014 at 5:11 AM > To: Carson Holt > Cc: > Subject: Re: [maker-devel] Early obstacle with SplitDB > > Hi Carson, > Thanks for the suggestions. > > I left the TMP= empty, which as you mentioned defaults to /tmp. There seems > to be a different error when using an NFS mounted directory (as I manually > verified). My /tmp is also not full or nearly full, I have verified proper > fasta formatting as I have run the fasta file through other statistics > generating tools (i.e. Quast). We are also update BioPerl monthly. > > Do you think it could be anything else? Do you think any more information > that I might be able to provide will be more insightful? > > > On Tue, Aug 5, 2014 at 1:26 PM, Carson Holt wrote: >> Either you speciied TMP= in your maker_opts.ctl file to be an NFS mounted >> directory (must be locally mounted), the drive containing directory specified >> by TMP= (defaults to /tmp) is full or nearly full, your input file is not >> proper fasta format, or you are using an out of date version of BioPerl. >> >> Try the first three in the list then look at BioPerl. The BioPerl version >> should be printed as part of the the debug output. >> >> --Carson >> >> >> From: Kevin Tsai >> Date: Tuesday, August 5, 2014 at 4:59 AM >> To: >> Subject: [maker-devel] Early obstacle with SplitDB >> >> Hello, >> I'm a new user to Maker so I suspect this will be a simple question, but I am >> having trouble finding documentation on SplitDB. Our IT admin set up the >> application and I'm running into the following issue about 30 seconds after >> kickoff. Below is the debugged output: >> >> STATUS: Parsing control files... >> Calling GI::load_control_files at /usr/bin/maker line 452. >> Calling GI::new_instance_temp at /usr/bin/maker line 463. >> Calling GI::mount_check at /usr/bin/maker line 465. >> Calling GI::set_global_temp at /usr/bin/maker line 483. >> STATUS: Processing and indexing input FASTA files... >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling GI::s_abs_path at /usr/bin/maker line 519. >> Calling List::Util::shuffle at /usr/bin/maker line 529. >> Calling GI::split_db at /usr/bin/maker line 536. >> Calling File::Path::rmtree at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling Iterator::Any::nextDef at /usr/bin/maker line 537. >> Calling Iterator::Any::new at /usr/bin/maker line 537. >> Calling mkdir at /usr/bin/maker line 537. >> Calling Iterator::Any::nextFastaRef at /usr/bin/maker line 537. >> Calling system at /usr/bin/maker line 537. >> ERROR: SplitDB not created correctly >> >> at /usr/local/share/perl5/GI.pm line 1144. >> GI::split_db("/home/keceltes/maker2/final.fasta", "nucleotide", 1, >> "/home/keceltes/maker2/final.maker.output/mpi_blastdb", "C") called at >> /usr/bin/maker line 537 >> --> rank=NA, hostname=Za2.cglab >> >> Any suggestions? Thank you in advance! >> -- >> Kevin Tsai >> www.linkedin.com/in/kevinjtsai/ >> Ph.D. Candidate, Bioinformatics >> Institute of Information Science, Academia Sinica >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > > > > -- > Kevin Tsai > www.linkedin.com/in/kevinjtsai/ > Ph.D. Candidate, Bioinformatics > Institute of Information Science, Academia Sinica > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wilbrandt at zfmk.de Thu Aug 14 09:40:04 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:40:04 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Thank you so much! However, I'm still, struggling, I'm afraid: I tried this 'two-step merging' approach with a subset of scaffolds and got duplicate IDs. Here is what I did: - divided input scaffolds in two files - run maker separately on these files (-> separate output dirs) -- additional input: maker-generated gff3 from previous (singular) run -- repeatmasking, snaphmm, gmhmm, augustus_species are given -- map_forward=0 / 1 (I tried both, to the same effect) - gff3_merge two times using index-log - gff3_merge these two gff3 files $ grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | uniq -c | sort -n | tail 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 $ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | grep "\sgene" scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 - found duplicates! i.e. the same ID for gene annotations in different areas of the same scaffold (of 655 gene annotations, 51 appear twice) -- this happens not only with gene, but also CDS and mRNA annotations, as far as I can see (here, in one example, non-everlapping but close CDS snippets got the same ID). I suspected this might have to do with the map_forward flag, but I get the same problem again (with genes at the same locations). I attached one of the ctl files for you in case you want to have a look, the other is analogous. Do you need something else? What did I miss? This should not happen, right? On Wed, 13 Aug 2014 15:52:34 +0000 Carson Holt wrote: >Yes. One cpu will have several processes, most are helper processes that >will use 0% CPU almost all of the time (for example there is a shared >variable manager process that will launch with MAKER but will also be >called 'maker' under top because it is technically its child and not a >separate script). Also system calls will launch a new process that will >use all CPU while the process calling it will drop to 0% CPU until it >finishes. > >Yes. Your explanation is correct. You then use gff3_merge to merge the >GFF3 file. > >--Carson > > > >On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: > >> >>Our admin counts processes. Do I understand you right, that one CPU >>handles several >>processes? >> >>I'm still confused by the different directories (and I made a mistake >>when asking last >>time, I wanted to say 'If I do NOT start the jobs in the same >>directory...). >>So, if I start each piece of a genome in its own directory (for example), >>then it gets a >>unique basename (because the output will be separate from all other >>pieces anyway) and I >>will not run dsindex but instead use gff3_merge for each piece's output >>and then once >>again to merge all resulting gff3-files? >> >>Hope I got you right :) >> >>Thanks fopr your help! >>Jeanne >> >> >> >>On Wed, 6 Aug 2014 15:45:56 +0000 >> Carson Holt wrote: >>>Is your admin counting processes or cpu usage? Because each system call >>>creates a >>>separate process, so you can expect multiple processes (each system call >>>generates a new >>>process) but only a single cpu of usage per instance. Use different >>>directories if you >>>are running that many jobs. You can concatenate the separate results >>>when your done. >>> Use gff3_merge script to help concatenate the separate GFF3 files >>>generated from >>>separate jobs. >>> >>>--Carson >>> >>>Sent from my iPhone >>> >>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>wrote: >>>> >>>> >>>> >>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>threads. Our admin >>>reports >>>> however, that the processes seem to assemble more threads than they >>>>are allowed. It is >>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>suggestion why? >>>> >>>> If I start the jobs in the same directory, how can I make sure they >>>>write to the same >>>> directory (as, I think is required to put the pieces together in the >>>>end?)? das >>>-basename >>>> take paths? >>>> >>>> >>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>> Carson Holt wrote: >>>>> I think the freezing is because you are starting too many >>>>>simultaneous jobs. You >>>should >>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>doing things can >>>>> start to cause problems If you are running 10 or more jobs in the >>>>>same directory. You >>>>> could try splitting them into different directories. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> aha, so this explains that. >>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>than 60,000, >>>roughly >>>>>> half of the sequences being shorter than 3,000 bp. >>>>>> >>>>>> What do you think about this weird 'I am running but not really doing >>>>> anything'-behavior? >>>>>> >>>>>> >>>>>> Thanks a lot! >>>>>> Jeanne >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>> Carson Holt wrote: >>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>the log can be >>>>>>> partially rebuilt. On rebuild only the FINISHED entries are added. >>>>>>> If there is a >>>>> GFF3 >>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>only exist for >>>the >>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>models. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Carson, >>>>>>>> >>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>genome which is split >>>>>>> into >>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed to >>>>>>>>finish >>>normally, >>>>>>> while >>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>output. Do you have >>>any >>>>>>>> suggestion why this is happening? >>>>>>>> >>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>found that of >>>38.384 >>>>>>>> mentioned scaffolds, 154 appear only once in the log. The surprise >>>>>>>>is, that 2/3 of >>>>>>> these >>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>models for these >>>>>>> 'finished' >>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>parts of the genome >>>>>>> (i.e., >>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>'finished') >>>>>>>> Should this be an issue of concern? >>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>NFS files look >>>>> good, >>>>>>> so >>>>>>>> we suspect something fishy going on... >>>>>>>> >>>>>>>> Hope you can help, >>>>>>>> best wishes, >>>>>>>> Jeanne Wilbrandt >>>>>>>> >>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> maker-devel mailing list >>>>>>>> maker-devel at box290.bluehost.com >>>>>>>> >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>>org >>>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts_Lclav_splitrun_problem_01_mapfwd.ctl Type: application/octet-stream Size: 5859 bytes Desc: not available URL: From carsonhh at gmail.com Thu Aug 14 09:46:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:46:44 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: What version of MAKER are you using? I'd also need to see the GFF3 files before the merge. You may also need to turn off map_forward since you are passing in GFF3 with MAKER names, creating new models with MAKER names and then moving names from old models forward onto new ones (which may force names to be used twice). --Carson On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >Thank you so much! > >However, I'm still, struggling, I'm afraid: I tried this 'two-step >merging' approach with >a subset of scaffolds and got duplicate IDs. > >Here is what I did: >- divided input scaffolds in two files >- run maker separately on these files (-> separate output dirs) >-- additional input: maker-generated gff3 from previous (singular) run >-- repeatmasking, snaphmm, gmhmm, augustus_species are given >-- map_forward=0 / 1 (I tried both, to the same effect) >- gff3_merge two times using index-log >- gff3_merge these two gff3 files > >$ >grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >uniq -c | sort -n >| tail > 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 > 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 > 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 > 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 > 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 > 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 > 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 > >$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >grep "\sgene" >scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 > >- found duplicates! i.e. the same ID for gene annotations in different >areas of the same >scaffold (of 655 gene annotations, 51 appear twice) >-- this happens not only with gene, but also CDS and mRNA annotations, as >far as I can >see (here, in one example, non-everlapping but close CDS snippets got the >same ID). > > >I suspected this might have to do with the map_forward flag, but I get >the same problem >again (with genes at the same locations). >I attached one of the ctl files for you in case you want to have a look, >the other is >analogous. Do you need something else? > >What did I miss? This should not happen, right? > > > > >On Wed, 13 Aug 2014 15:52:34 +0000 > Carson Holt wrote: >>Yes. One cpu will have several processes, most are helper processes that >>will use 0% CPU almost all of the time (for example there is a shared >>variable manager process that will launch with MAKER but will also be >>called 'maker' under top because it is technically its child and not a >>separate script). Also system calls will launch a new process that will >>use all CPU while the process calling it will drop to 0% CPU until it >>finishes. >> >>Yes. Your explanation is correct. You then use gff3_merge to merge the >>GFF3 file. >> >>--Carson >> >> >> >>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Our admin counts processes. Do I understand you right, that one CPU >>>handles several >>>processes? >>> >>>I'm still confused by the different directories (and I made a mistake >>>when asking last >>>time, I wanted to say 'If I do NOT start the jobs in the same >>>directory...). >>>So, if I start each piece of a genome in its own directory (for >>>example), >>>then it gets a >>>unique basename (because the output will be separate from all other >>>pieces anyway) and I >>>will not run dsindex but instead use gff3_merge for each piece's output >>>and then once >>>again to merge all resulting gff3-files? >>> >>>Hope I got you right :) >>> >>>Thanks fopr your help! >>>Jeanne >>> >>> >>> >>>On Wed, 6 Aug 2014 15:45:56 +0000 >>> Carson Holt wrote: >>>>Is your admin counting processes or cpu usage? Because each system >>>>call >>>>creates a >>>>separate process, so you can expect multiple processes (each system >>>>call >>>>generates a new >>>>process) but only a single cpu of usage per instance. Use different >>>>directories if you >>>>are running that many jobs. You can concatenate the separate results >>>>when your done. >>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>generated from >>>>separate jobs. >>>> >>>>--Carson >>>> >>>>Sent from my iPhone >>>> >>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>wrote: >>>>> >>>>> >>>>> >>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>threads. Our admin >>>>reports >>>>> however, that the processes seem to assemble more threads than they >>>>>are allowed. It is >>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>suggestion why? >>>>> >>>>> If I start the jobs in the same directory, how can I make sure they >>>>>write to the same >>>>> directory (as, I think is required to put the pieces together in the >>>>>end?)? das >>>>-basename >>>>> take paths? >>>>> >>>>> >>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>> Carson Holt wrote: >>>>>> I think the freezing is because you are starting too many >>>>>>simultaneous jobs. You >>>>should >>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>doing things can >>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>same directory. You >>>>>> could try splitting them into different directories. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> aha, so this explains that. >>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>than 60,000, >>>>roughly >>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>> >>>>>>> What do you think about this weird 'I am running but not really >>>>>>>doing >>>>>> anything'-behavior? >>>>>>> >>>>>>> >>>>>>> Thanks a lot! >>>>>>> Jeanne >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>the log can be >>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>added. >>>>>>>> If there is a >>>>>> GFF3 >>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>only exist for >>>>the >>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>models. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Carson, >>>>>>>>> >>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>genome which is split >>>>>>>> into >>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>to >>>>>>>>>finish >>>>normally, >>>>>>>> while >>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>output. Do you have >>>>any >>>>>>>>> suggestion why this is happening? >>>>>>>>> >>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>found that of >>>>38.384 >>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>surprise >>>>>>>>>is, that 2/3 of >>>>>>>> these >>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>models for these >>>>>>>> 'finished' >>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>parts of the genome >>>>>>>> (i.e., >>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>'finished') >>>>>>>>> Should this be an issue of concern? >>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>NFS files look >>>>>> good, >>>>>>>> so >>>>>>>>> we suspect something fishy going on... >>>>>>>>> >>>>>>>>> Hope you can help, >>>>>>>>> best wishes, >>>>>>>>> Jeanne Wilbrandt >>>>>>>>> >>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> maker-devel mailing list >>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>> >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b. >>>>>>>>>org >>>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:55:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:55:15 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: Which 2.31? Current is 2.31.6. --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From carsonhh at gmail.com Thu Aug 14 09:57:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 14 Aug 2014 09:57:39 -0600 Subject: [maker-devel] Further split genome questions In-Reply-To: References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: For the file you just sent me, is that from the first run with map_forward=0 or with map_forward=1? --Carson On 8/14/14, 9:53 AM, "Jeanne Wilbrandt" wrote: > >It is version 2.31. > >My first try was done with map_forward=0, and (I just noticed) the >duplicates are present >in the separate gff3s already also in this case (one is attached). > >Has this something to do with the first-run-gff3 I fed it? > > > > >On Thu, 14 Aug 2014 15:46:44 +0000 > Carson Holt wrote: >>What version of MAKER are you using? I'd also need to see the GFF3 files >>before the merge. You may also need to turn off map_forward since you >>are >>passing in GFF3 with MAKER names, creating new models with MAKER names >>and >>then moving names from old models forward onto new ones (which may force >>names to be used twice). >> >>--Carson >> >> >>On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: >> >>> >>>Thank you so much! >>> >>>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>>merging' approach with >>>a subset of scaffolds and got duplicate IDs. >>> >>>Here is what I did: >>>- divided input scaffolds in two files >>>- run maker separately on these files (-> separate output dirs) >>>-- additional input: maker-generated gff3 from previous (singular) run >>>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>>-- map_forward=0 / 1 (I tried both, to the same effect) >>>- gff3_merge two times using index-log >>>- gff3_merge these two gff3 files >>> >>>$ >>>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>>uniq -c | sort -n >>>| tail >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >>> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >>> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >>> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >>> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >>> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>>grep "\sgene" >>>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf7180005181 >>>47 >>>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0. >>>3 >>>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf71800051814 >>>75 >>>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>> >>>- found duplicates! i.e. the same ID for gene annotations in different >>>areas of the same >>>scaffold (of 655 gene annotations, 51 appear twice) >>>-- this happens not only with gene, but also CDS and mRNA annotations, >>>as >>>far as I can >>>see (here, in one example, non-everlapping but close CDS snippets got >>>the >>>same ID). >>> >>> >>>I suspected this might have to do with the map_forward flag, but I get >>>the same problem >>>again (with genes at the same locations). >>>I attached one of the ctl files for you in case you want to have a look, >>>the other is >>>analogous. Do you need something else? >>> >>>What did I miss? This should not happen, right? >>> >>> >>> >>> >>>On Wed, 13 Aug 2014 15:52:34 +0000 >>> Carson Holt wrote: >>>>Yes. One cpu will have several processes, most are helper processes >>>>that >>>>will use 0% CPU almost all of the time (for example there is a shared >>>>variable manager process that will launch with MAKER but will also be >>>>called 'maker' under top because it is technically its child and not a >>>>separate script). Also system calls will launch a new process that >>>>will >>>>use all CPU while the process calling it will drop to 0% CPU until it >>>>finishes. >>>> >>>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>>GFF3 file. >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>>> >>>>> >>>>>Our admin counts processes. Do I understand you right, that one CPU >>>>>handles several >>>>>processes? >>>>> >>>>>I'm still confused by the different directories (and I made a mistake >>>>>when asking last >>>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>>directory...). >>>>>So, if I start each piece of a genome in its own directory (for >>>>>example), >>>>>then it gets a >>>>>unique basename (because the output will be separate from all other >>>>>pieces anyway) and I >>>>>will not run dsindex but instead use gff3_merge for each piece's >>>>>output >>>>>and then once >>>>>again to merge all resulting gff3-files? >>>>> >>>>>Hope I got you right :) >>>>> >>>>>Thanks fopr your help! >>>>>Jeanne >>>>> >>>>> >>>>> >>>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>>> Carson Holt wrote: >>>>>>Is your admin counting processes or cpu usage? Because each system >>>>>>call >>>>>>creates a >>>>>>separate process, so you can expect multiple processes (each system >>>>>>call >>>>>>generates a new >>>>>>process) but only a single cpu of usage per instance. Use different >>>>>>directories if you >>>>>>are running that many jobs. You can concatenate the separate results >>>>>>when your done. >>>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>>generated from >>>>>>separate jobs. >>>>>> >>>>>>--Carson >>>>>> >>>>>>Sent from my iPhone >>>>>> >>>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>> >>>>>>>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>>threads. Our admin >>>>>>reports >>>>>>> however, that the processes seem to assemble more threads than they >>>>>>>are allowed. It is >>>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>>suggestion why? >>>>>>> >>>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>>write to the same >>>>>>> directory (as, I think is required to put the pieces together in >>>>>>>the >>>>>>>end?)? das >>>>>>-basename >>>>>>> take paths? >>>>>>> >>>>>>> >>>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>>> Carson Holt wrote: >>>>>>>> I think the freezing is because you are starting too many >>>>>>>>simultaneous jobs. You >>>>>>should >>>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>>doing things can >>>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>>same directory. You >>>>>>>> could try splitting them into different directories. >>>>>>>> >>>>>>>> --Carson >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>>> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> aha, so this explains that. >>>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>>than 60,000, >>>>>>roughly >>>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>>> >>>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>>doing >>>>>>>> anything'-behavior? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks a lot! >>>>>>>>> Jeanne >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>>> Carson Holt wrote: >>>>>>>>>> If you are starting and restarting, or running multiple jobs >>>>>>>>>>then >>>>>>>>>>the log can be >>>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>>added. >>>>>>>>>> If there is a >>>>>>>> GFF3 >>>>>>>>>> result file for the contig, then it is FINISHED. FASTA files >>>>>>>>>>will >>>>>>>>>>only exist for >>>>>>the >>>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>>models. >>>>>>>>>> >>>>>>>>>> --Carson >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Carson, >>>>>>>>>>> >>>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>>genome which is split >>>>>>>>>> into >>>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>>to >>>>>>>>>>>finish >>>>>>normally, >>>>>>>>>> while >>>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>>output. Do you have >>>>>>any >>>>>>>>>>> suggestion why this is happening? >>>>>>>>>>> >>>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>>found that of >>>>>>38.384 >>>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>>surprise >>>>>>>>>>>is, that 2/3 of >>>>>>>>>> these >>>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>>models for these >>>>>>>>>> 'finished' >>>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>>parts of the genome >>>>>>>>>> (i.e., >>>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' >>>>>>>>>>>but >>>>>>>>>>>'finished') >>>>>>>>>>> Should this be an issue of concern? >>>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but >>>>>>>>>>>the >>>>>>>>>>>NFS files look >>>>>>>> good, >>>>>>>>>> so >>>>>>>>>>> we suspect something fishy going on... >>>>>>>>>>> >>>>>>>>>>> Hope you can help, >>>>>>>>>>> best wishes, >>>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>>> >>>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>la >>>>>>>>>>>b. >>>>>>>>>>>org >>>>>>> >>>>> >>>> >>>> >>> >> >> > From j.wilbrandt at zfmk.de Thu Aug 14 09:53:38 2014 From: j.wilbrandt at zfmk.de (Jeanne Wilbrandt) Date: Thu, 14 Aug 2014 17:53:38 +0200 Subject: [maker-devel] Further split genome questions In-Reply-To: <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> References: <0a6beb5590c54f228b7c29981728f00e@SVZFMKVM05.domzfmk.museum-koenig.de> <6e19a4cdaa4a4872827649d94a360a46@SVZFMKVM05.domzfmk.museum-koenig.de> <6ad8da6517f048b4bc92bd0cc54c3902@SVZFMKVM05.domzfmk.museum-koenig.de> <4c183411b99447cc86601276b66fce1f@SVZFMKVM05.domzfmk.museum-koenig.de> Message-ID: It is version 2.31. My first try was done with map_forward=0, and (I just noticed) the duplicates are present in the separate gff3s already also in this case (one is attached). Has this something to do with the first-run-gff3 I fed it? On Thu, 14 Aug 2014 15:46:44 +0000 Carson Holt wrote: >What version of MAKER are you using? I'd also need to see the GFF3 files >before the merge. You may also need to turn off map_forward since you are >passing in GFF3 with MAKER names, creating new models with MAKER names and >then moving names from old models forward onto new ones (which may force >names to be used twice). > >--Carson > > >On 8/14/14, 9:40 AM, "Jeanne Wilbrandt" wrote: > >> >>Thank you so much! >> >>However, I'm still, struggling, I'm afraid: I tried this 'two-step >>merging' approach with >>a subset of scaffolds and got duplicate IDs. >> >>Here is what I did: >>- divided input scaffolds in two files >>- run maker separately on these files (-> separate output dirs) >>-- additional input: maker-generated gff3 from previous (singular) run >>-- repeatmasking, snaphmm, gmhmm, augustus_species are given >>-- map_forward=0 / 1 (I tried both, to the same effect) >>- gff3_merge two times using index-log >>- gff3_merge these two gff3 files >> >>$ >>grep -P "\tgene\t" merged_all.gff3 | cut -f9 | cut -f1 -d ";" | sort | >>uniq -c | sort -n >>| tail >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005140699-processed-gene-0.22 >> 2 ID=snap_masked-scf7180005140699-processed-gene-1.36 >> 2 ID=snap_masked-scf7180005140713-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.4 >> 2 ID=snap_masked-scf7180005140744-processed-gene-0.6 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.14 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.15 >> 2 ID=snap_masked-scf7180005140754-processed-gene-0.19 >> 2 ID=snap_masked-scf7180005181475-processed-gene-0.3 >> >>$ grep snap_masked-scf7180005181475-processed-gene-0.3 merged_all.gff3 | >>grep "\sgene" >>scf7180005181475 maker gene 9050 9385 . - . ID=snap_masked-scf718000518147 >>5-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >>scf7180005181475 maker gene 846 1088 . - . ID=snap_masked-scf7180005181475 >>-processed-gene-0.3;Name=snap_masked-scf7180005181475-processed-gene-0.3 >> >>- found duplicates! i.e. the same ID for gene annotations in different >>areas of the same >>scaffold (of 655 gene annotations, 51 appear twice) >>-- this happens not only with gene, but also CDS and mRNA annotations, as >>far as I can >>see (here, in one example, non-everlapping but close CDS snippets got the >>same ID). >> >> >>I suspected this might have to do with the map_forward flag, but I get >>the same problem >>again (with genes at the same locations). >>I attached one of the ctl files for you in case you want to have a look, >>the other is >>analogous. Do you need something else? >> >>What did I miss? This should not happen, right? >> >> >> >> >>On Wed, 13 Aug 2014 15:52:34 +0000 >> Carson Holt wrote: >>>Yes. One cpu will have several processes, most are helper processes that >>>will use 0% CPU almost all of the time (for example there is a shared >>>variable manager process that will launch with MAKER but will also be >>>called 'maker' under top because it is technically its child and not a >>>separate script). Also system calls will launch a new process that will >>>use all CPU while the process calling it will drop to 0% CPU until it >>>finishes. >>> >>>Yes. Your explanation is correct. You then use gff3_merge to merge the >>>GFF3 file. >>> >>>--Carson >>> >>> >>> >>>On 8/13/14, 3:32 AM, "Jeanne Wilbrandt" wrote: >>> >>>> >>>>Our admin counts processes. Do I understand you right, that one CPU >>>>handles several >>>>processes? >>>> >>>>I'm still confused by the different directories (and I made a mistake >>>>when asking last >>>>time, I wanted to say 'If I do NOT start the jobs in the same >>>>directory...). >>>>So, if I start each piece of a genome in its own directory (for >>>>example), >>>>then it gets a >>>>unique basename (because the output will be separate from all other >>>>pieces anyway) and I >>>>will not run dsindex but instead use gff3_merge for each piece's output >>>>and then once >>>>again to merge all resulting gff3-files? >>>> >>>>Hope I got you right :) >>>> >>>>Thanks fopr your help! >>>>Jeanne >>>> >>>> >>>> >>>>On Wed, 6 Aug 2014 15:45:56 +0000 >>>> Carson Holt wrote: >>>>>Is your admin counting processes or cpu usage? Because each system >>>>>call >>>>>creates a >>>>>separate process, so you can expect multiple processes (each system >>>>>call >>>>>generates a new >>>>>process) but only a single cpu of usage per instance. Use different >>>>>directories if you >>>>>are running that many jobs. You can concatenate the separate results >>>>>when your done. >>>>> Use gff3_merge script to help concatenate the separate GFF3 files >>>>>generated from >>>>>separate jobs. >>>>> >>>>>--Carson >>>>> >>>>>Sent from my iPhone >>>>> >>>>>> On Aug 6, 2014, at 9:33 AM, "Jeanne Wilbrandt" >>>>>>wrote: >>>>>> >>>>>> >>>>>> >>>>>> We are using MPI as well, each of the 20 parts gets assigned 4 >>>>>>threads. Our admin >>>>>reports >>>>>> however, that the processes seem to assemble more threads than they >>>>>>are allowed. It is >>>>>> not Blast (which is set to 1 cpu in the opts.ctl). Do you have a >>>>>>suggestion why? >>>>>> >>>>>> If I start the jobs in the same directory, how can I make sure they >>>>>>write to the same >>>>>> directory (as, I think is required to put the pieces together in the >>>>>>end?)? das >>>>>-basename >>>>>> take paths? >>>>>> >>>>>> >>>>>> On Wed, 6 Aug 2014 15:12:50 +0000 >>>>>> Carson Holt wrote: >>>>>>> I think the freezing is because you are starting too many >>>>>>>simultaneous jobs. You >>>>>should >>>>>>> try and use MPI to parallelize instead. The concurrent job way of >>>>>>>doing things can >>>>>>> start to cause problems If you are running 10 or more jobs in the >>>>>>>same directory. You >>>>>>> could try splitting them into different directories. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>>> On Aug 6, 2014, at 9:01 AM, "Jeanne Wilbrandt" >>>>>>>> >>>>>>>>wrote: >>>>>>>> >>>>>>>> >>>>>>>> aha, so this explains that. >>>>>>>> Daniel, the average is 5930.37 bp, but ranging from ~ 50 to more >>>>>>>>than 60,000, >>>>>roughly >>>>>>>> half of the sequences being shorter than 3,000 bp. >>>>>>>> >>>>>>>> What do you think about this weird 'I am running but not really >>>>>>>>doing >>>>>>> anything'-behavior? >>>>>>>> >>>>>>>> >>>>>>>> Thanks a lot! >>>>>>>> Jeanne >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 6 Aug 2014 14:16:52 +0000 >>>>>>>> Carson Holt wrote: >>>>>>>>> If you are starting and restarting, or running multiple jobs then >>>>>>>>>the log can be >>>>>>>>> partially rebuilt. On rebuild only the FINISHED entries are >>>>>>>>>added. >>>>>>>>> If there is a >>>>>>> GFF3 >>>>>>>>> result file for the contig, then it is FINISHED. FASTA files will >>>>>>>>>only exist for >>>>>the >>>>>>>>> contigs that have gene models. Small contigs will rarely contain >>>>>>>>>models. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Aug 6, 2014, at 6:40 AM, "Jeanne Wilbrandt" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Carson, >>>>>>>>>> >>>>>>>>>> I ran into more conspicuous behavior running maker 2.31 on a >>>>>>>>>>genome which is split >>>>>>>>> into >>>>>>>>>> 20 parts, using the -g flag and the same basename. >>>>>>>>>> Most of the jobs ran simultaneously on the same node, 17 seemed >>>>>>>>>>to >>>>>>>>>>finish >>>>>normally, >>>>>>>>> while >>>>>>>>>> the remaining three seemed to be stalled and produced 0B of >>>>>>>>>>output. Do you have >>>>>any >>>>>>>>>> suggestion why this is happening? >>>>>>>>>> >>>>>>>>>> After I stopped these stalled jobs, I checked the index.log and >>>>>>>>>>found that of >>>>>38.384 >>>>>>>>>> mentioned scaffolds, 154 appear only once in the log. The >>>>>>>>>>surprise >>>>>>>>>>is, that 2/3 of >>>>>>>>> these >>>>>>>>>> only appear as FINISHED (the rest only started). There are no >>>>>>>>>>models for these >>>>>>>>> 'finished' >>>>>>>>>> scaffolds stored in the .db and they are distributed over all >>>>>>>>>>parts of the genome >>>>>>>>> (i.e., >>>>>>>>>> each of the 20 jobs contained scaffolds that 'did not start' but >>>>>>>>>>'finished') >>>>>>>>>> Should this be an issue of concern? >>>>>>>>>> It might be a NFS lock problem, as NFS is heavily loaded, but the >>>>>>>>>>NFS files look >>>>>>> good, >>>>>>>>> so >>>>>>>>>> we suspect something fishy going on... >>>>>>>>>> >>>>>>>>>> Hope you can help, >>>>>>>>>> best wishes, >>>>>>>>>> Jeanne Wilbrandt >>>>>>>>>> >>>>>>>>>> zmb // ZFMK // University of Bonn >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>>b. >>>>>>>>>>org >>>>>> >>>> >>> >>> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: splitrun_problem_01_all.gff3 Type: application/octet-stream Size: 4967463 bytes Desc: not available URL: From daniel.standage at gmail.com Thu Aug 21 09:33:33 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:33:33 -0400 Subject: [maker-devel] tRNAscan GFF3 Message-ID: Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:35:16 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:35:16 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: It should be on the same strand. Which MAKER version are you using? --Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:33 AM To: Maker Mailing List Subject: [maker-devel] tRNAscan GFF3 Greetings! I have a quick question about Maker's handling of tRNAscan output, particularly tRNAs containing introns. If I haven't missed something, it looks like Maker reports the second exon on the opposite strand as the first exon, the tRNA feature, and the gene feature? Am I reading this correctly? I don't think this representation makes sense. The second exon is complementary to the first (hence the folding), but it is not encoded on or transcribed from the opposite strand. Unless I've misunderstood something, I would suggest that the correct representation would be to have all features on the same strand. Thanks, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu Aug 21 09:36:41 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 21 Aug 2014 11:36:41 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it > looks like Maker reports the second exon on the opposite strand as the > first exon, the tRNA feature, and the gene feature? Am I reading this > correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, > I would suggest that the correct representation would be to have all > features on the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 09:49:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 09:49:36 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: I half way remember some tRNAscan bugs being fixed in several of the sub versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. Could you give it a try and see if it is still giving you the issue. I did a quick look through the archives and I think this was found and fixed --> https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-d evel/Z-kvf_V2ynU/vstSNjHgyJQJ Thanks, Carson From: Daniel Standage Date: Thursday, August 21, 2014 at 9:36 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 This annotation was generated using Maker 2.31.3. -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > It should be on the same strand. Which MAKER version are you using? > > --Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:33 AM > To: Maker Mailing List > Subject: [maker-devel] tRNAscan GFF3 > > Greetings! > > I have a quick question about Maker's handling of tRNAscan output, > particularly tRNAs containing introns. If I haven't missed something, it looks > like Maker reports the second exon on the opposite strand as the first exon, > the tRNA feature, and the gene feature? Am I reading this correctly? > > I don't think this representation makes sense. The second exon is > complementary to the first (hence the folding), but it is not encoded on or > transcribed from the opposite strand. Unless I've misunderstood something, I > would suggest that the correct representation would be to have all features on > the same strand. > > Thanks, > Daniel > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Tue Aug 19 03:19:08 2014 From: rens.holmer at wur.nl (rens holmer) Date: Tue, 19 Aug 2014 11:19:08 +0200 Subject: [maker-devel] Maker error mpiexec Message-ID: Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Timothy.Stitt at tgac.ac.uk Thu Aug 21 14:05:46 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 21 Aug 2014 20:05:46 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:17:22 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:17:22 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes Message-ID: MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:21:19 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:21:19 -0600 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: You need to make sure the same version of MPI is used to compile and run MAKER. When installing MAKER make sure the mpi.h and mpicc indicated during configuration come from the same version of OpenMPI as the mpiexec command you are using now. Also for OpenMPI run the following command before setting up or launching MAKER --> export LD_PRELOAD=?/openmpi_location/lib/libmpi.so replace openmpi_location in the above command with the location of your OpenMPI. Setting LD_PRELOAD preload is required for OpenMPI to work correctly with shared libraries. Also you may need to add the following to your MPI command before running MAKER. --> -mca btl ^openib Example --> mpiexec -mca btl ^openib -n 40 maker Thanks, Carson From: rens holmer Date: Tuesday, August 19, 2014 at 3:19 AM To: Subject: [maker-devel] Maker error mpiexec Hi, I am trying to run maker using MPI, and I get an error I do not understand. Maker version: 2.13.6 mpiexec version: mpiexec (OpenRTE) 1.6.5 When I run ./Build status it is reported that MPI is enabled. When I run mpiexec -n 40 maker I get the following errors: [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25563] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [assembly:25562] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- Etcetera etcetera. However: when I search for the files reported as missing I do find them, and I don't believe they are from a different version of MPI? Am I using a wrong version of MPI? Any help would be appreciated, Sincerely, Rens Holmer _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Aug 21 14:27:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 21 Aug 2014 14:27:14 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: FYI. If you use the -nolock flag, never start MAKER more than once in the same directory. The lack of file locks means MAKER won't detect the other active process and they can end up overwriting each others output. So do any parallelization via MPI instead. Thanks, Carson From: Carson Holt Date: Thursday, August 21, 2014 at 2:17 PM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rens.holmer at wur.nl Fri Aug 22 04:43:20 2014 From: rens.holmer at wur.nl (rens holmer) Date: Fri, 22 Aug 2014 12:43:20 +0200 Subject: [maker-devel] Maker error mpiexec In-Reply-To: References: Message-ID: Thank you! export LD_PRELOAD=?/openmpi_location/lib/libmpi.so mpiexec -mca btl ^openib -n 40 maker Those two tweaks did the trick! Sincerely, Rens Holmer On Thu, Aug 21, 2014 at 10:21 PM, Carson Holt wrote: > You need to make sure the same version of MPI is used to compile and run > MAKER. When installing MAKER make sure the mpi.h and mpicc indicated > during configuration come from the same version of OpenMPI as the mpiexec > command you are using now. > > Also for OpenMPI run the following command before setting up or launching > MAKER --> > export LD_PRELOAD=?/openmpi_location/lib/libmpi.so > > replace openmpi_location in the above command with the location of your > OpenMPI. > > Setting LD_PRELOAD preload is required for OpenMPI to work correctly with > shared libraries. > > > Also you may need to add the following to your MPI command before running > MAKER. > --> -mca btl ^openib > Example --> mpiexec -mca btl ^openib -n 40 maker > > Thanks, > Carson > > > > From: rens holmer > Date: Tuesday, August 19, 2014 at 3:19 AM > To: > Subject: [maker-devel] Maker error mpiexec > > Hi, > > I am trying to run maker using MPI, and I get an error I do not understand. > > Maker version: 2.13.6 > mpiexec version: mpiexec (OpenRTE) 1.6.5 > > When I run ./Build status it is reported that MPI is enabled. > > When I run mpiexec -n 40 maker I get the following errors: > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, > or compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing > symbol, or compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25563] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > [assembly:25562] mca: base: component_find: unable to open > /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or > compiled for a different version of Open MPI? (ignored) > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > opal_shmem_base_select failed > > --> Returned value -1 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > > > Etcetera etcetera. > > However: when I search for the files reported as missing I do find them, > and I don't believe they are from a different version of MPI? > > Am I using a wrong version of MPI? > > Any help would be appreciated, > > Sincerely, > > > Rens Holmer > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Aug 26 08:53:25 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 26 Aug 2014 14:53:25 +0000 Subject: [maker-devel] MAKER run error -with blast Message-ID: <1409064805543.27602@uga.edu> Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- total clusters:6 now processing 0 prepare section files Gathering GFF3 input into hits - chunk:18 prepare section files Gathering GFF3 input into hits - chunk:19 Removing file: /lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_sn1prot/correct_uniprot_sn1prot_gff/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/scaffold00001.7.end.section.holdover ERROR: No such file or directory at /lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_sn1prot/correct_uniprot_sn1prot_gff/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/scaffold00001.7.end.section.holdover at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4482. Process::MpiChunk::__ANON__() called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 408 eval {...} called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 407 Error::subs::try(CODE(0x12f4d0f8), HASH(0x129acca0)) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4491 Process::MpiChunk::retrieve("/lustre1/escratch1/ranjani_Jul_23/sn1_comparisons/maker_with_"...) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 3311 Process::MpiChunk::__ANON__() called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 415 eval {...} called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Error.pm line 407 Error::subs::try(CODE(0x1262ed50), HASH(0x12409548)) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 4215 Process::MpiChunk::_go(Process::MpiChunk=HASH(0x12eb0f70), "run", HASH(0x12958360), 12, 3) called at /panfs/pstor.storage/rcclocal/zcluster/maker/2.31.5-mpich2/bin/../lib/Process/MpiChunk.pm line 341 Process::MpiChunk::run(Process::MpiChunk=HASH(0x12eb0f70), 5) called at /usr/local/maker/latest/bin/maker line 979 --> rank=5, hostname=compute-6-5.local --> rank=5, hostname=compute-6-5.local ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold00001 -------------- next part -------------- STATUS: Parsing control files... STATUS: Processing and indexing input FASTA files... STATUS: Setting up database for any GFF3 input... A data structure will be created for you at: /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore To access files for individual sequences use the datastore index: /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_master_datastore_index.log STATUS: Now running MAKER... examining contents of the fasta file and run log --Next Contig-- #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold00001 Length: 11360811 #--------------------------------------------------------------------- setting up GFF3 output and fasta chunks doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_8GDzH7; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.0.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_1sNfMC; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.6.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.1.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.2.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.5.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.3.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /lscratch/tmp/5603554.1.rcc-30d/maker_isHjoB; /panfs/pstor.storage/rcclocal/zcluster/repeatmasker/4.0.1/RepeatMasker /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0/scaffold00001.4.Alveolata.rb -species Alveolata -dir /panfs/pstor.storage/grphomes/jcklab/i_000059/Mero_strand_specific_04142014/maker_final_ssCuff_uniprotTop/retry_scf1/scaffold00001.maker.output/scaffold00001_datastore/B8/E3/scaffold00001//theVoid.scaffold00001/0 -pa 1 #-------------------------------# WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the reqWuested dUBlastSearcatabase to seahEngine:rch! :search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: RepeatMasker failed --> rank=7, hostname=compute-13-7.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=1, hostname=compute-9-15.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) ERROR: RepeatMasker failed --> rank=3, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=2, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 WUBlastSWUBlastSearchEngiearchEngine::search: FAne::searcTAL: Thh: FATere is AL: Tnothinghere i in thes noth requesing inted dat the requested database abase to seato searchrch! ! WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! WARNING: Comparison failed. Retrying with larger minmatch (10) WUBlastSearchEngine::search: FATAL: There is nothing in the requested database to search! ERROR: RepeatMasker failed --> rank=6, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=5, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 ERROR: RepeatMasker failed --> rank=4, hostname=compute-8-29.local ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold00001 From carsonhh at gmail.com Tue Aug 26 09:03:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 09:03:28 -0600 Subject: [maker-devel] MAKER run error -with blast Message-ID: Make sure you are not setting TMP= in the maker_opts.ctl file to an NFS mounted location. Also check your /tmp directory to see if it is full or nearly full (it will be mounted on a different drive than your working directory). Also if it is being caused by slow NFS response you can set clean_try=1 and it will do complete retry on the contig rather than trying to recover partial files. --Carson From: Sivaranjani Namasivayam Date: Tuesday, August 26, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER run error -with blast Hi, I have been using MAKER for a while and its been running fine. Recently I am encountering an error (attaching the error from the error log file - error1.txt). As input I am providing the fasta file of a scaffold, a transcriptome dataset(in gff) and a protein dataset (as fasta). These kind of input files have run successfully in the past. The file that is reported as 'No such file or directory at' in the error ouptut changes in different runs. To make sure I wasn't doing something wrong, I reran a dataset that had run successfully before, but I get an error with that too. (error log attached as error2.txt). The only difference in this run, previously I ran it for the entire genome, and now I am testing it on just one scaffold. Would you have any idea of why this might be happening? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Tue Aug 26 09:55:40 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Tue, 26 Aug 2014 11:55:40 -0400 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I > believe and most 2.31 updates were related to tRNAscan). Current version > is 2.31.6. Could you give it a try and see if it is still giving you the > issue. > > I did a quick look through the archives and I think this was found and > fixed --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-devel/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: > >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the >> first exon, the tRNA feature, and the gene feature? Am I reading this >> correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, >> I would suggest that the correct representation would be to have all >> features on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Aug 26 10:06:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 26 Aug 2014 10:06:26 -0600 Subject: [maker-devel] tRNAscan GFF3 In-Reply-To: References: Message-ID: Thanks. --Carson From: Daniel Standage Date: Tuesday, August 26, 2014 at 9:55 AM To: Carson Holt Cc: Maker Mailing List Subject: Re: [maker-devel] tRNAscan GFF3 Sorry for the delayed response. In the mean time, I wrote a tiny script to correct the erroneous tRNA annotations. I just now took a few minutes to download 2.31.6, and can confirm that the tRNA exon strands are consistent. Best, Daniel -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, Aug 21, 2014 at 11:49 AM, Carson Holt wrote: > I half way remember some tRNAscan bugs being fixed in several of the sub > versions of 2.31 (tRNAscan was only introduced as an option in 2.30 I believe > and most 2.31 updates were related to tRNAscan). Current version is 2.31.6. > Could you give it a try and see if it is still giving you the issue. > > I did a quick look through the archives and I think this was found and fixed > --> > https://groups.google.com/forum/#!searchin/maker-devel/trna$20strand/maker-dev > el/Z-kvf_V2ynU/vstSNjHgyJQJ > > Thanks, > Carson > > > From: Daniel Standage > Date: Thursday, August 21, 2014 at 9:36 AM > To: Carson Holt > Cc: Maker Mailing List > Subject: Re: [maker-devel] tRNAscan GFF3 > > This annotation was generated using Maker 2.31.3. > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, Aug 21, 2014 at 11:35 AM, Carson Holt wrote: >> It should be on the same strand. Which MAKER version are you using? >> >> --Carson >> >> >> From: Daniel Standage >> Date: Thursday, August 21, 2014 at 9:33 AM >> To: Maker Mailing List >> Subject: [maker-devel] tRNAscan GFF3 >> >> Greetings! >> >> I have a quick question about Maker's handling of tRNAscan output, >> particularly tRNAs containing introns. If I haven't missed something, it >> looks like Maker reports the second exon on the opposite strand as the first >> exon, the tRNA feature, and the gene feature? Am I reading this correctly? >> >> I don't think this representation makes sense. The second exon is >> complementary to the first (hence the folding), but it is not encoded on or >> transcribed from the opposite strand. Unless I've misunderstood something, I >> would suggest that the correct representation would be to have all features >> on the same strand. >> >> Thanks, >> Daniel >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hossein.Borhan at AGR.GC.CA Wed Aug 27 09:52:54 2014 From: Hossein.Borhan at AGR.GC.CA (Borhan, Hossein) Date: Wed, 27 Aug 2014 15:52:54 +0000 Subject: [maker-devel] non-redundant fasta and gff Message-ID: Hi Is there a way to produce a fasta file and gff for a set of non-redundant genes predicted by the Maker software. Fasta-merge and gff-merge generate a file that has different prediction (e.g generated by Augustus, GeneMark etc. ) for the same gene sac as as individual genes. Regards Hossein From carsonhh at gmail.com Wed Aug 27 09:57:10 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:57:10 -0600 Subject: [maker-devel] non-redundant fasta and gff Message-ID: The fasta files created for augustus, snap, etc. are only for reference purposes. They are the raw ab initio prediction produced by these algorithms ran by themselves (they are match/match_part features in the GFF3 file). The file you want is the maker.transcripts.fasta and maker.proteins.fasta files. They contain the non-redundant final annotations. They are the same ones that are marked as gene/mRNA/exon/CDS features in the GFF3 file. --Carson On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: >Hi > > >Is there a way to produce a fasta file and gff for a set of non-redundant >genes predicted by the Maker software. Fasta-merge and gff-merge generate >a file that has different prediction (e.g generated by Augustus, >GeneMark etc. ) for the same gene sac as as individual genes. > > > >Regards > > >Hossein > > > > > > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Wed Aug 27 09:58:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 27 Aug 2014 09:58:47 -0600 Subject: [maker-devel] non-redundant fasta and gff In-Reply-To: References: Message-ID: Please see the documentation wiki for explanations of how to read and use MAEKR's output. http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_ GMOD_Online_Training_2014#MAKER.27s_Output Thanks, Carson On 8/27/14, 9:57 AM, "Carson Holt" wrote: >The fasta files created for augustus, snap, etc. are only for reference >purposes. They are the raw ab initio prediction produced by these >algorithms ran by themselves (they are match/match_part features in the >GFF3 file). The file you want is the maker.transcripts.fasta and >maker.proteins.fasta files. They contain the non-redundant final >annotations. They are the same ones that are marked as gene/mRNA/exon/CDS >features in the GFF3 file. > >--Carson > > >On 8/27/14, 9:52 AM, "Borhan, Hossein" wrote: > >>Hi >> >> >>Is there a way to produce a fasta file and gff for a set of non-redundant >>genes predicted by the Maker software. Fasta-merge and gff-merge generate >>a file that has different prediction (e.g generated by Augustus, >>GeneMark etc. ) for the same gene sac as as individual genes. >> >> >> >>Regards >> >> >>Hossein >> >> >> >> >> >> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >