From mphoeppner at gmail.com Mon Sep 1 08:07:40 2014 From: mphoeppner at gmail.com (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Mon, 1 Sep 2014 15:07:40 +0200 Subject: [maker-devel] est2genome=1 for est and altest Message-ID: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc From dence at genetics.utah.edu Tue Sep 2 10:32:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 2 Sep 2014 15:32:03 +0000 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: Hi Marc, This is a partial answer to your question. I don't know the full reason that models aren't built from altest evidence, but I do know that those sequences are aligned with tblastx (nucleotide translated to protein and back to nucleotide) and not with blastn with relaxed parameters. Also the final protein and nucleotide alignments that do get made into models are made by exonerate and not by blast. Does that help? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc H?ppner [mphoeppner at gmail.com] Sent: Monday, September 01, 2014 7:07 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] est2genome=1 for est and altest Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Sep 2 11:57:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Sep 2014 10:57:56 -0600 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: There is a reason why no altest2genome option exists in the maker_opts.ctl file. The est2genome and protein2genome options are meant only for generating rough partial models that can be used for training gene finders (should not be used for generating final models). And if you are thinking of using ESTs from another species (altest) to generate initial models for training it's actually an analysis error. This is because altest alignments will be far less accurate than EST or protein alignments (so they will hurt your training). They are slower to generate than EST or protein alignments (by as much as 10-20 fold because they are translated into all 6 reading frames). Also there will be far fewer of them (6 frames of translation make the alignments more spurious; thus they require higher thresholds of significance). So if you are using a species for initial training that is distant enough that it must be aligned as altest via tblastx, then you should have been using proteins instead which will be widely available and more accurately aligned. Note that both proteins and altests are aligned in amino acid space, so you can expect anywhere from several million to hundreds of millions of years of divergence, and the species you use is not expected to be closely related (so whole proteomes will be available from a number of sources that will be far more accurate than any altest alignment). The only real benefit of altest is to provide evidence of lineage specific genes for organisms where there are no species in the same branch or phylum to get protein evidence from. Since there will only be a handful of these genes and they can be obtained in any later bootstrap training steps which will not involve est2genome or protein2genome models. You should use protein2genome models instead for the initial training and only use altest for a any bootstrap training or for your final models. Thanks, Carson On 9/1/14, 7:07 AM, "Marc H?ppner" wrote: >Hi, > >I may be wrong about this, but it seems to me that Maker will never build >a gene model from EST evidence, if the set data is provided as ?altest' >rather than ?est'. In my case, I am annotating a plant for which there is >a closely related reference genome + annotation, as well as pretty good >EST data. So I supplied the EST data as ?altest', assuming that the only >difference would be that the alignment parameters would be slightly more >relaxed. But I found that Maker never made any genome models from that >data. When moving the EST data to ?est?, it worked. > >So I am not sure whether this is an intended behaviour, but in my case it >caught me a bit by surprise? > >Regards, > >Marc >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 06:38:16 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 11:38:16 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 09:22:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:22:08 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 09:25:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:25:31 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 12:39:39 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 17:39:39 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: References: Message-ID: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > From dence at genetics.utah.edu Thu Sep 4 12:44:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 4 Sep 2014 17:44:47 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: , <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi Anindyajit, It doesn't look like the error output that you sent to Barry was forwarded with your message. Can you send that again? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Barry Moore [bmoore at genetics.utah.edu] Sent: Thursday, September 04, 2014 11:39 AM To: Anindyajit Banerjee Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fgenesh output to gff3 conversion Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From MEC at stowers.org Thu Sep 4 13:10:14 2014 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 4 Sep 2014 18:10:14 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi, I'm not sure what maker offers in this regard. It's been some time since I've used it now. Anyway, if it helps, some time ago I wrote a quick fgenesh2gff using BioPerl. It is provided here. You need a bioperl installation. http://bio.perl.org/pipermail/bioperl-l/2006-July/022061.html ~Malcolm Cook >-----Original Message----- >From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Barry Moore >Sent: Thursday, September 04, 2014 12:40 PM >To: Anindyajit Banerjee >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Fgenesh output to gff3 conversion > >Hi Anindyajit, > >I'm forwarding you message along to the maker mailing list and devel team... > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the >further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby >attach a test output for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 13:45:15 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 18:45:15 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 13:52:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 12:52:20 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 14:01:57 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 19:01:57 +0000 Subject: [maker-devel] Fwd: Fgenesh output to gff3 conversion References: Message-ID: <77D4D576-9BAC-478D-8A0F-492225D71637@genetics.utah.edu> Attached is the document that Anindyajit set with his original question. B Begin forwarded message: From: Anindyajit Banerjee > Subject: Fgenesh output to gff3 conversion Date: September 4, 2014 at 8:37:26 AM MDT To: > Hi I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh test out put file for your understanding Please help -- Regards, Anindyajit Banerjee Mobile: +919883333000. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh_output_test Type: application/octet-stream Size: 199696 bytes Desc: fgenesh_output_test URL: From carsonhh at gmail.com Thu Sep 4 14:06:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:06:28 -0600 Subject: [maker-devel] Fgenesh output to gff3 conversion Message-ID: MAKER can't convert the existing output, but you could use MAKER to run FGENESH for you instead. The results of which would be in GFF3. --Carson On 9/4/14, 11:39 AM, "Barry Moore" wrote: >Hi Anindyajit, > >I?m forwarding you message along to the maker mailing list and devel team? > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee >wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I >>am trying to convert the fgenesh output to gff3 format for the further >>input in EVM. However I am encountering the error while doing so. Could >>you suggest me any possible way to do so. I hereby attach a test output >>for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 14:24:06 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 19:24:06 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 14:42:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:42:06 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 09:43:19 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 14:43:19 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Hi, I finished running MAKER as suggested above. Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g options. I called the output file maker.gff3 In the maker.gff3 I found some invalid data (does not conform .gff3 format), e.g. ### 2 + ### OR ### .Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Targ et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + ### OR some gene (or mRNA) IDs are not uniq. This means they can be found multiple times with different values within the maker.gff3 How could it happen? As I understood, mRNA IDs in a .gff3 file must be uniq. Thanks Anh-Dao On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker_opts.ctl (Carson Holt) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 18 Jul 2014 11:04:09 -0600 >From: Carson Holt >To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel > Ence >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] Maker_opts.ctl >Message-ID: >Content-Type: text/plain; charset="UTF-8" > >It should just be 'fgenesh'. If it's not there you can still just give >the GFF3. > >--Carson > > >On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>I am not sure which fgenesh executable file should I use. >> >>fgenesh= #location of fgenesh executable >> >>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you need >>to specify a list of other executable programs (such as ppd, ppdn+, etc) >> >>Anh-Dao >> >> >>On 7/16/14 3:32 PM, "Carson Holt" wrote: >> >>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>previous run. Then provide the RepeatModeler file to rmlib= >>> >>>--Carson >>> >>> >>> >>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>By default, model_org=all. Can I use the de novo repeat library >>>>predicted >>>>by RepeatModeler for the rmlib option? >>>> >>>>Anh-Dao >>>> >>>> >>>> >>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>> >>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>rmlib=. >>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>types >>>>>and use soft masking for some and hard masking for others. This >>>>>increases >>>>>sensitivity of evidence alignments while still maintaining >>>>>specificity. >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>I will run Augustus and FGENESH++ inside of MAKER using the parameter >>>>>>files for Augustus. >>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>using >>>>>>two >>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via >>>>>>de >>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>of >>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>database. >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>> >>>>>>>When you ran Augustus separately, it should have created the >>>>>>>parameters >>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>using >>>>>>>the species name you just created. >>>>>>> >>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>than >>>>>>>giving it the results as GFF3. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>Thanks Daniel for your quick response. >>>>>>>> >>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>Augustus. >>>>>>>>I created the parameter file for the genome following their >>>>>>>>instructions. >>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>be >>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>Augustus >>>>>>>>has not trained that genome (no parameter file exists). Otherwise I >>>>>>>>would >>>>>>>>run Augustus inside MAKER. >>>>>>>> >>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>(sure >>>>>>>>I >>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>.gff3 >>>>>>>>files, separated by comma? >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>> >>>>>>>>>Hi Anh-Dao, >>>>>>>>> >>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>command >>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>files >>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>> >>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>options >>>>>>>>>in >>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>the >>>>>>>>>?pred_gff? option. >>>>>>>>> >>>>>>>>>If you have the RepeatMasker output in gff3 format you can give it >>>>>>>>>to >>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>> >>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>to >>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>gives >>>>>>>>>fasta >>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>the >>>>>>>>>other est fasta files. >>>>>>>>> >>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>can >>>>>>>>>use >>>>>>>>>another organism that augustus is already trained for. The list of >>>>>>>>>species that augustus has parameter files for is in the README.txt >>>>>>>>>that >>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>inside >>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>ext-based >>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>predictive ability. >>>>>>>>> >>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>another >>>>>>>>>organism?s parameter file? >>>>>>>>> >>>>>>>>>Thanks, >>>>>>>>>Daniel >>>>>>>>> >>>>>>>>> >>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>following >>>>>>>>>>data: >>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>options) >>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>> - proteins (fasta) >>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert >>>>>>>>>>to >>>>>>>>>>ZFF >>>>>>>>>>format, etc. ) >>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>convert >>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>for >>>>>>>>>>Augustus. >>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>> >>>>>>>>>> Could you please let me know how can I specify parameters in the >>>>>>>>>>maker_opts.ctl file? >>>>>>>>>> Or do you have other suggestions to re-do the data listed above? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Anh-Dao >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>maker-devel mailing list >>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab >>>>>>>>. >>>>>>>>o >>>>>>>>r >>>>>>>>g >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 74, Issue 17 >******************************************* From carsonhh at gmail.com Fri Sep 5 10:37:02 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:37:02 -0600 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 Message-ID: The partial lines are symptoms of writing data to a slow NFS mounted drive. If NFS can't get a response for a write operation, it returns success (even though it wasn't really successful) and then continues to wait for the operation to really complete. This is called asynchronous writing. It improves performance by optimistically returning success on all operations rather than waiting to see if the operation really succeeded. If you have a slow or overloaded NFS mount though, you can get a number a failures and never any indication that they failed except for the fact that some files are missing content or lines are partial. When this happens, you need to run MAKER with the -a flag on fewer CPUs to rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can find which contigs have partial GFF3 lines, you can delete just those along with the datastore index log file and then launch maker without any flags to let it recompute just those contigs. Another possible cause is also NFS related. If you are running MAKER multiple times in the same working directory, and a slow NFS mount doesn't allow maker to properly lock files, then two maker jobs can try and compute the same contig simultaneously. Simultaneous writing of files can then cause IDs to be duplicated and some lines to be munged as lines from one process arrive to the file in the middle of lines from another process (creating a jumble of characters and partial lines). Start a singe maker job on fewer cpus using the -a flag to rebuild the GFF3 files if this is the case. Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are passing in GFF3 files with already assigned IDS (that may be used elsewhere). Are you using GFF3 pass-trough? Features that will not have unique ID= tags are CDS, three_prime_utr, and five_prime_utr features (these are considered non-continuous features because of the shared ID across lines). You can see examples here --> http://www.sequenceontology.org/gff3.shtml Also Name= attributes are not required to be unique. Thanks, Carson On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >Hi, > >I finished running MAKER as suggested above. >Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >options. I called the output file maker.gff3 > >In the maker.gff3 I found some invalid data (does not conform .gff3 >format), e.g. > >### >2 + >### > >OR > >### >.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Tar >g >et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >### > >OR some gene (or mRNA) IDs are not uniq. This means they can be found >multiple times with different values within the maker.gff3 > >How could it happen? As I understood, mRNA IDs in a .gff3 file must be >uniq. > >Thanks >Anh-Dao > > > > > >On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" > wrote: > >>Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >>To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >>or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >>You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >>When replying, please edit your Subject line so it is more specific >>than "Re: Contents of maker-devel digest..." >> >> >>Today's Topics: >> >> 1. Re: Maker_opts.ctl (Carson Holt) >> >> >>---------------------------------------------------------------------- >> >>Message: 1 >>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>From: Carson Holt >>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >> Ence >>Cc: "maker-devel at yandell-lab.org" >>Subject: Re: [maker-devel] Maker_opts.ctl >>Message-ID: >>Content-Type: text/plain; charset="UTF-8" >> >>It should just be 'fgenesh'. If it's not there you can still just give >>the GFF3. >> >>--Carson >> >> >>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>I am not sure which fgenesh executable file should I use. >>> >>>fgenesh= #location of fgenesh executable >>> >>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>need >>>to specify a list of other executable programs (such as ppd, ppdn+, etc) >>> >>>Anh-Dao >>> >>> >>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>> >>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>previous run. Then provide the RepeatModeler file to rmlib= >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>predicted >>>>>by RepeatModeler for the rmlib option? >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>> >>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>> >>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>rmlib=. >>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>types >>>>>>and use soft masking for some and hard masking for others. This >>>>>>increases >>>>>>sensitivity of evidence alignments while still maintaining >>>>>>specificity. >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>parameter >>>>>>>files for Augustus. >>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>using >>>>>>>two >>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>via >>>>>>>de >>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>of >>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>database. >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>parameters >>>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>>using >>>>>>>>the species name you just created. >>>>>>>> >>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>than >>>>>>>>giving it the results as GFF3. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>> >>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>Augustus. >>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>instructions. >>>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>>be >>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>Augustus >>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>I >>>>>>>>>would >>>>>>>>>run Augustus inside MAKER. >>>>>>>>> >>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>(sure >>>>>>>>>I >>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>.gff3 >>>>>>>>>files, separated by comma? >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>>> >>>>>>>>>>Hi Anh-Dao, >>>>>>>>>> >>>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>command >>>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>>files >>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>> >>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>options >>>>>>>>>>in >>>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>>the >>>>>>>>>>?pred_gff? option. >>>>>>>>>> >>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>it >>>>>>>>>>to >>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>> >>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>>to >>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>gives >>>>>>>>>>fasta >>>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>>the >>>>>>>>>>other est fasta files. >>>>>>>>>> >>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>can >>>>>>>>>>use >>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>of >>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>README.txt >>>>>>>>>>that >>>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>>inside >>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>ext-based >>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>predictive ability. >>>>>>>>>> >>>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>>another >>>>>>>>>>organism?s parameter file? >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>following >>>>>>>>>>>data: >>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>options) >>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>> - proteins (fasta) >>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>convert >>>>>>>>>>>to >>>>>>>>>>>ZFF >>>>>>>>>>>format, etc. ) >>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>convert >>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>>for >>>>>>>>>>>Augustus. >>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>> >>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>the >>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>above? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Anh-Dao >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>maker-devel mailing list >>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b >>>>>>>>>. >>>>>>>>>o >>>>>>>>>r >>>>>>>>>g >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> >> >>------------------------------ >> >>Subject: Digest Footer >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >>------------------------------ >> >>End of maker-devel Digest, Vol 74, Issue 17 >>******************************************* > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Fri Sep 5 02:58:59 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Fri, 5 Sep 2014 07:58:59 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 5 10:17:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:17:45 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I'm glad the work around is working for you. Proc::ProcessTable being faster than 'ps' is actually very very atypical. It is likely there is an issue with your system which is suggested by the fact 'ps' is hanging and accumulating processes which is also very atypical (ps should return in a fraction of a second). We actually switched from Proc::ProcessTable to 'ps' some time ago because 'ps' is several fold faster, and Proc::ProcessTable won't compile on about 10-15% of system architectures. Thanks, Carson From: "Timothy Stitt (TGAC)" Date: Friday, September 5, 2014 at 1:58 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 11:08:50 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 16:08:50 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Thanks Carson. I ran MAKER on 30 CPUs. I will re-run it using 10 CPUs. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > I submitted est_gff=cufflinks.gff3 and pred_gff=fgenesh.gff3 when running MAKER. However, I got 4 repeated mRNA ids as follows: augustus_masked-c206700011.Contig3-processed-gene-0.3 augustus_masked-c206700011.Contig3-processed-gene-0.3-mRNA-1 snap_masked-c206500027.Contig3-processed-gene-0.26 snap_masked-c206500027.Contig3-processed-gene-0.26-mRNA-1 Anh-Dao From Brian.Mack at ARS.USDA.GOV Mon Sep 8 08:47:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 8 Sep 2014 13:47:01 +0000 Subject: [maker-devel] non-overlapping predictions Message-ID: Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set-td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Sep 9 12:14:09 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 9 Sep 2014 17:14:09 +0000 Subject: [maker-devel] Non-canonical splice junctions Message-ID: <1410282848765.20893@uga.edu> Hi, Is it possible to force MAKER to predict gene models only with canonical splice sites? Or is a quick way to identify gene models with non-canonical splice sites? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 9 17:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 09 Sep 2014 16:09:13 -0600 Subject: [maker-devel] non-overlapping predictions Message-ID: It's a naming issue. The reference match/match_part features have 'abinit' in the name while the non-overlapping fasta file has 'processed' in the name of the fasta header. The easiest way to fix it is to just replaced 'processed' with 'abinit' in the terms you are searching for. This was supposed to be resolved already, but I'll see what's going on. What version of MAKER are you using? --Carson From: "Mack, Brian" Date: Monday, September 8, 2014 at 7:47 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] non-overlapping predictions Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set -td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Thu Sep 18 06:49:45 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Thu, 18 Sep 2014 11:49:45 +0000 Subject: [maker-devel] CPUs problems Message-ID: I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I checked the log file and got these errors: Processing run.log file... examining contents of the fasta file and run log shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory Can you let me know how can I fix the problem? Thanks Anh-Dao On 9/5/14 11:37 AM, "Carson Holt" wrote: >The partial lines are symptoms of writing data to a slow NFS mounted >drive. If NFS can't get a response for a write operation, it returns >success (even though it wasn't really successful) and then continues to >wait for the operation to really complete. This is called asynchronous >writing. It improves performance by optimistically returning success on >all operations rather than waiting to see if the operation really >succeeded. If you have a slow or overloaded NFS mount though, you can get >a number a failures and never any indication that they failed except for >the fact that some files are missing content or lines are partial. > >When this happens, you need to run MAKER with the -a flag on fewer CPUs to >rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >find which contigs have partial GFF3 lines, you can delete just those >along with the datastore index log file and then launch maker without any >flags to let it recompute just those contigs. > >Another possible cause is also NFS related. If you are running MAKER >multiple times in the same working directory, and a slow NFS mount doesn't >allow maker to properly lock files, then two maker jobs can try and >compute the same contig simultaneously. Simultaneous writing of files can >then cause IDs to be duplicated and some lines to be munged as lines from >one process arrive to the file in the middle of lines from another process >(creating a jumble of characters and partial lines). Start a singe maker >job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >the case. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > >Features that will not have unique ID= tags are CDS, three_prime_utr, and >five_prime_utr features (these are considered non-continuous features >because of the shared ID across lines). >You can see examples here --> http://www.sequenceontology.org/gff3.shtml > >Also Name= attributes are not required to be unique. > >Thanks, >Carson > > > > > > >On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>Hi, >> >>I finished running MAKER as suggested above. >>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>options. I called the output file maker.gff3 >> >>In the maker.gff3 I found some invalid data (does not conform .gff3 >>format), e.g. >> >>### >>2 + >>### >> >>OR >> >>### >>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Ta >>r >>g >>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>### >> >>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>multiple times with different values within the maker.gff3 >> >>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>uniq. >> >>Thanks >>Anh-Dao >> >> >> >> >> >>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >> wrote: >> >>>Send maker-devel mailing list submissions to >>> maker-devel at yandell-lab.org >>> >>>To subscribe or unsubscribe via the World Wide Web, visit >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>>or, via email, send a message with subject or body 'help' to >>> maker-devel-request at yandell-lab.org >>> >>>You can reach the person managing the list at >>> maker-devel-owner at yandell-lab.org >>> >>>When replying, please edit your Subject line so it is more specific >>>than "Re: Contents of maker-devel digest..." >>> >>> >>>Today's Topics: >>> >>> 1. Re: Maker_opts.ctl (Carson Holt) >>> >>> >>>---------------------------------------------------------------------- >>> >>>Message: 1 >>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>From: Carson Holt >>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>> Ence >>>Cc: "maker-devel at yandell-lab.org" >>>Subject: Re: [maker-devel] Maker_opts.ctl >>>Message-ID: >>>Content-Type: text/plain; charset="UTF-8" >>> >>>It should just be 'fgenesh'. If it's not there you can still just give >>>the GFF3. >>> >>>--Carson >>> >>> >>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>I am not sure which fgenesh executable file should I use. >>>> >>>>fgenesh= #location of fgenesh executable >>>> >>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>need >>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>etc) >>>> >>>>Anh-Dao >>>> >>>> >>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>> >>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>predicted >>>>>>by RepeatModeler for the rmlib option? >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>> >>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>rmlib=. >>>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>>types >>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>increases >>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>specificity. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>parameter >>>>>>>>files for Augustus. >>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>using >>>>>>>>two >>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>via >>>>>>>>de >>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>>of >>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>database. >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>> >>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>parameters >>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>MAKER >>>>>>>>>using >>>>>>>>>the species name you just created. >>>>>>>>> >>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>than >>>>>>>>>giving it the results as GFF3. >>>>>>>>> >>>>>>>>>--Carson >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>> >>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>Augustus. >>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>instructions. >>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>gene >>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>will >>>>>>>>>>be >>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>Augustus >>>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>>I >>>>>>>>>>would >>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>> >>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>>(sure >>>>>>>>>>I >>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>.gff3 >>>>>>>>>>files, separated by comma? >>>>>>>>>> >>>>>>>>>>Anh-Dao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>> >>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>protein >>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>command >>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>protein >>>>>>>>>>>files >>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>> >>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>options >>>>>>>>>>>in >>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>in >>>>>>>>>>>the >>>>>>>>>>>?pred_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>gives >>>>>>>>>>>fasta >>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>all >>>>>>>>>>>the >>>>>>>>>>>other est fasta files. >>>>>>>>>>> >>>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>>can >>>>>>>>>>>use >>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>of >>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>README.txt >>>>>>>>>>>that >>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>from >>>>>>>>>>>inside >>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>ext-based >>>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>>predictive ability. >>>>>>>>>>> >>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>use >>>>>>>>>>>another >>>>>>>>>>>organism?s parameter file? >>>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>following >>>>>>>>>>>>data: >>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>options) >>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>convert >>>>>>>>>>>>to >>>>>>>>>>>>ZFF >>>>>>>>>>>>format, etc. ) >>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>convert >>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>trained >>>>>>>>>>>>for >>>>>>>>>>>>Augustus. >>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>> >>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>the >>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>above? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Anh-Dao >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> >>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell >>>>>>>>>>>>- >>>>>>>>>>>>l >>>>>>>>>>>>a >>>>>>>>>>>>b >>>>>>>>>>>>. >>>>>>>>>>>>o >>>>>>>>>>>>r >>>>>>>>>>>>g >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>maker-devel mailing list >>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>>------------------------------ >>> >>>Subject: Digest Footer >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>>------------------------------ >>> >>>End of maker-devel Digest, Vol 74, Issue 17 >>>******************************************* >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri Sep 19 12:22:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Sep 2014 11:22:50 -0600 Subject: [maker-devel] CPUs problems In-Reply-To: References: Message-ID: These are further symptoms of an IO related issue. The script cannot even query it's current working directory. Check to make sure there is plenty of space in the temporary directory /tmp. If /tmp is separately mounted on each machine there may be one that is full. Also make sure you did not set TMP= in the maker_opts.ctl file to an NFS mounted location. Do you by any chance get any warnings when you start MAKER. For example --> WARNING: Multiple MAKER processes have been started in the same directory. That would indicate that the MPI communication rung is down which would drastically increase IO operations. You may also have one or more nodes that are having the issue and are the source of all the errors. If you are using OpenMPI to run MAKER, you can tag the output from each node using the --tag-output flag for mpiexec. Then if the same node is always producing the error, you can have IT look at it. Also MAKER is set to automatically retry on errors. If all contigs are finished, check the output. Make sure there are the same number of genes in the fasta files and GFF3 files. Also look for munged content. If everything looks ok, MAKER may have gotten around the issue through shear brute force (I.e. it retried until it succeeded). --Carson On 9/18/14, 5:49 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I >checked the log file and got these errors: > >Processing run.log file... >examining contents of the fasta file and run log >shell-init: error retrieving current directory: getcwd: cannot access >parent directories: No such file or directory > > >Can you let me know how can I fix the problem? > >Thanks >Anh-Dao > > >On 9/5/14 11:37 AM, "Carson Holt" wrote: > >>The partial lines are symptoms of writing data to a slow NFS mounted >>drive. If NFS can't get a response for a write operation, it returns >>success (even though it wasn't really successful) and then continues to >>wait for the operation to really complete. This is called asynchronous >>writing. It improves performance by optimistically returning success on >>all operations rather than waiting to see if the operation really >>succeeded. If you have a slow or overloaded NFS mount though, you can get >>a number a failures and never any indication that they failed except for >>the fact that some files are missing content or lines are partial. >> >>When this happens, you need to run MAKER with the -a flag on fewer CPUs >>to >>rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >>find which contigs have partial GFF3 lines, you can delete just those >>along with the datastore index log file and then launch maker without any >>flags to let it recompute just those contigs. >> >>Another possible cause is also NFS related. If you are running MAKER >>multiple times in the same working directory, and a slow NFS mount >>doesn't >>allow maker to properly lock files, then two maker jobs can try and >>compute the same contig simultaneously. Simultaneous writing of files >>can >>then cause IDs to be duplicated and some lines to be munged as lines from >>one process arrive to the file in the middle of lines from another >>process >>(creating a jumble of characters and partial lines). Start a singe maker >>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >>the case. >> >>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you >>are >>passing in GFF3 files with already assigned IDS (that may be used >>elsewhere). Are you using GFF3 pass-trough? >> >>Features that will not have unique ID= tags are CDS, three_prime_utr, and >>five_prime_utr features (these are considered non-continuous features >>because of the shared ID across lines). >>You can see examples here --> http://www.sequenceontology.org/gff3.shtml >> >>Also Name= attributes are not required to be unique. >> >>Thanks, >>Carson >> >> >> >> >> >> >>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>Hi, >>> >>>I finished running MAKER as suggested above. >>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>>options. I called the output file maker.gff3 >>> >>>In the maker.gff3 I found some invalid data (does not conform .gff3 >>>format), e.g. >>> >>>### >>>2 + >>>### >>> >>>OR >>> >>>### >>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;T >>>a >>>r >>>g >>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>>### >>> >>>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>>multiple times with different values within the maker.gff3 >>> >>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>>uniq. >>> >>>Thanks >>>Anh-Dao >>> >>> >>> >>> >>> >>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >>> wrote: >>> >>>>Send maker-devel mailing list submissions to >>>> maker-devel at yandell-lab.org >>>> >>>>To subscribe or unsubscribe via the World Wide Web, visit >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>g >>>> >>>>or, via email, send a message with subject or body 'help' to >>>> maker-devel-request at yandell-lab.org >>>> >>>>You can reach the person managing the list at >>>> maker-devel-owner at yandell-lab.org >>>> >>>>When replying, please edit your Subject line so it is more specific >>>>than "Re: Contents of maker-devel digest..." >>>> >>>> >>>>Today's Topics: >>>> >>>> 1. Re: Maker_opts.ctl (Carson Holt) >>>> >>>> >>>>---------------------------------------------------------------------- >>>> >>>>Message: 1 >>>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>>From: Carson Holt >>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>>> Ence >>>>Cc: "maker-devel at yandell-lab.org" >>>>Subject: Re: [maker-devel] Maker_opts.ctl >>>>Message-ID: >>>>Content-Type: text/plain; charset="UTF-8" >>>> >>>>It should just be 'fgenesh'. If it's not there you can still just give >>>>the GFF3. >>>> >>>>--Carson >>>> >>>> >>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>I am not sure which fgenesh executable file should I use. >>>>> >>>>>fgenesh= #location of fgenesh executable >>>>> >>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>>need >>>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>>etc) >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>>> >>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like >>>>>>your >>>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>>predicted >>>>>>>by RepeatModeler for the rmlib option? >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>>rmlib=. >>>>>>>> By letting MAKER handle repeat masking it will differentiate >>>>>>>>repeat >>>>>>>>types >>>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>>increases >>>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>>specificity. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>>parameter >>>>>>>>>files for Augustus. >>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>>using >>>>>>>>>two >>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>>via >>>>>>>>>de >>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM >>>>>>>>>inside >>>>>>>>>of >>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>>database. >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>>> >>>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>>parameters >>>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>>MAKER >>>>>>>>>>using >>>>>>>>>>the species name you just created. >>>>>>>>>> >>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>>than >>>>>>>>>>giving it the results as GFF3. >>>>>>>>>> >>>>>>>>>>--Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>>> >>>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>>Augustus. >>>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>>instructions. >>>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>>gene >>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>>will >>>>>>>>>>>be >>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>>Augustus >>>>>>>>>>>has not trained that genome (no parameter file exists). >>>>>>>>>>>Otherwise >>>>>>>>>>>I >>>>>>>>>>>would >>>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>>> >>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker >>>>>>>>>>>output >>>>>>>>>>>(sure >>>>>>>>>>>I >>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>>.gff3 >>>>>>>>>>>files, separated by comma? >>>>>>>>>>> >>>>>>>>>>>Anh-Dao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>>> >>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>>protein >>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>>command >>>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>>protein >>>>>>>>>>>>files >>>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>>> >>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>>options >>>>>>>>>>>>in >>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>>in >>>>>>>>>>>>the >>>>>>>>>>>>?pred_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>>gives >>>>>>>>>>>>fasta >>>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>>all >>>>>>>>>>>>the >>>>>>>>>>>>other est fasta files. >>>>>>>>>>>> >>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then >>>>>>>>>>>>you >>>>>>>>>>>>can >>>>>>>>>>>>use >>>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>>of >>>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>>README.txt >>>>>>>>>>>>that >>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>>from >>>>>>>>>>>>inside >>>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>>ext-based >>>>>>>>>>>>hints to augustus at runtime, which can really improve >>>>>>>>>>>>Augustus? >>>>>>>>>>>>predictive ability. >>>>>>>>>>>> >>>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>>use >>>>>>>>>>>>another >>>>>>>>>>>>organism?s parameter file? >>>>>>>>>>>> >>>>>>>>>>>>Thanks, >>>>>>>>>>>>Daniel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>>following >>>>>>>>>>>>>data: >>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>>options) >>>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>>convert >>>>>>>>>>>>>to >>>>>>>>>>>>>ZFF >>>>>>>>>>>>>format, etc. ) >>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>>convert >>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>>trained >>>>>>>>>>>>>for >>>>>>>>>>>>>Augustus. >>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>>the >>>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>>above? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Anh-Dao >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> >>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandel >>>>>>>>>>>>>l >>>>>>>>>>>>>- >>>>>>>>>>>>>l >>>>>>>>>>>>>a >>>>>>>>>>>>>b >>>>>>>>>>>>>. >>>>>>>>>>>>>o >>>>>>>>>>>>>r >>>>>>>>>>>>>g >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>maker-devel mailing list >>>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>>------------------------------ >>>> >>>>Subject: Digest Footer >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>>------------------------------ >>>> >>>>End of maker-devel Digest, Vol 74, Issue 17 >>>>******************************************* >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carson.holt at genetics.utah.edu Mon Sep 22 15:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 22 Sep 2014 20:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <541BCE0A.70806@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> Message-ID: The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > From myandell at genetics.utah.edu Mon Sep 22 19:10:38 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 23 Sep 2014 00:10:38 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: References: <541BCE0A.70806@env.ethz.ch>, Message-ID: <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> Also are you numbers including the ab-inito predictions without evidence that have pfamm domains? cheers, --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carson.holt at genetics.utah.edu] Sent: Monday, September 22, 2014 2:17 PM To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From aksrao at ucdavis.edu Thu Sep 25 11:18:30 2014 From: aksrao at ucdavis.edu (Anand K S Rao) Date: Thu, 25 Sep 2014 09:18:30 -0700 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Message-ID: Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Sep 25 13:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 25 Sep 2014 18:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <5421695F.5040409@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> <5421695F.5040409@env.ethz.ch> Message-ID: Sorry for the slow reply. I was trying to locate a script that might be useful for you. I think a species specific repeat libary will be of most benefit here (it's surprising just how influential this step is). Also note that you should train SNAP and Augustus on your species and are not just using another related species as a stand in. With respect to PFAM domains, on some organisms you may not get a lot of cross species protein alignments because of divergence or assembly issues. This of course makes it harder to support these models with direct protein alignments. However you can run InterProscan over the non-overlapping.proteins.fasta file produced by MAKER (contains non-redundant rejected models). Because an HMM is used for domain identification, it can pick up protein domains that would not produce a significant BLAST alignment because of divergence. You can then add models with positive hits for protein domains back into your gene set. This ad hoc procedure usually can only increase gene counts by about 10% though for organisms where it's required. I've attached a script that makes generating results for these genes easier. 1. First you run InterProScan with just PFAM. 2. Then you take the IDs of all models that have a domain in the report and create a list (1 ID per line). 3. Next use the fasta_tool script that comes with MAKER together with the --select flag to separate just the positive hits (ID's in your list) from the non-overlapping.proteins.fasta and non-overlapping.transscripts.fasta files. 4. Use the attached script to separate just the positive hits (your ID list) from the GFF3. The script will upgrade match/match_part results to gene/mRNA/exon/CDS results and print them out for you. 5. Use the fasta_maerge and gff3_merge scripts that come with MAKER to merge the selected/upgraded GFF3 entries and selected FASTA entries back into the original MAKER results. --Carson On 9/23/14, 6:36 AM, "Stefan Zoller" wrote: >Please forgive my ignorance, I am not entirely sure if I understand your >question correctly, but I will try to answer. >As evidence we use: >1) our own transcriptome (trinity assembled RNAseq, filtering out the >very low expression transcripts). >2) all swissprot plant proteins, and several protein sets from closely >related plant species downloaded from NCBI. >I am not sure if the ab-initio predictions without evidence have pfamm >domains. Honestly, I would not know how to tell and how to >include/exclude. >I was assuming that we should not have too many Maker approved >predictions without evidence anyway, because we use "keeps_preds=0". >The numbers of gene predictions I mentioned in my email are the >predictions reported by the fasta_merge/gff3_merge scripts in the >"*maker.proteins.fasta". There are of course many more predictions in >e.g., "*maker.augustus_masked.proteins.fasta" (about 68'000 in this file). > >I hope I am not totally off with my answer. >Cheers, Stefan > > > >On 23.09.14 02:10, Mark Yandell wrote: >> Also are you numbers including the ab-inito predictions without >>evidence that have pfamm domains? >> >> cheers, >> >> >> --mark >> >> >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Co-director USTAR Center for Genetic Discovery >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Carson Holt [carson.holt at genetics.utah.edu] >> Sent: Monday, September 22, 2014 2:17 PM >> To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. >>scaffolded genome >> >> The contiged assembly is more likely to give spurious hits and >>alignments. >> They also can be harder to repeat mask. Also gene predictors can >>behave >> slightly different on small sequences than on longer ones. If you have >> fewer gene models than you expect, your first step should be to process >> the scaffolds with CEGMA. It will give you an estimate of the genomes >> "completeness". If CEGMA gives a 60% completeness value for example >>then >> you can expect to only recover 60% of the expected number of genes. Next >> you should run RepeatModeler of similar software to help generate a >> species specific repeat library. Under masked repeats can make >>predicting >> genes on longer scaffolds far more difficult for ab initio predictors. >> >> --Carson >> >> >> On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >> >>> Hi, >>> >>> I am working on the annotation of a plant genome (about 600MB) and we >>> have a reasonable draft assembly, a fairly good transcriptome and quite >>> a few proteins from related species. We have also extensively trained >>> augustus and are also feeding genmark and snap predictions. >>> >>> Recently I noticed a behavior of Maker that seems fairly odd and which >>>I >>> cannot explain at all. When I take the scaffolded genome (about 23000 >>> scaffolds) I get roughly 9'000 maker approved gene models. Which is >>> admittedly a bit on the low side and we have to work on this. However, >>> when I break up the scaffolds into contigs at stretches of N longer >>> 500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >>> obviously 17'000 is more in the range what I would expect, so I am >>> inclined to go with these. I have looked at both annotations and the >>> evidence in WebApollo and the evidence alignments are identical for >>>both >>> runs. The approved genes seem to be the same, except for the additional >>> ones in the "contiged" genome version. The additional gene models are >>> not necessarily at the ends of the contigs, so I think it has nothing >>>to >>> do with having the stretches of Ns nearby in the scaffolded genome. Do >>> you have any idea why maker comes up with the additional numbers of >>>gene >>> models and how I could "convince" maker to give me the same gene models >>> for the scaffolded assembly? >>> >>> Cheers, >>> Stefan >>> >>> >>> >>> -- >>> Stefan Zoller, PhD >>> Bioinformatics >>> Genetic Diversity Centre >>> ETH Zurich CHN E55.1 >>> Universit?tsstrasse 16 >>> 8092 Zurich >>> Switzerland >>> >>> Phone: +41 44 632 66 85 >>> E-Mail: stefan.zoller at env.ethz.ch >>> Web: www.gdc.ethz.ch >>> >>> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 5523 bytes Desc: gff3_preds2models URL: From carsonhh at gmail.com Thu Sep 25 13:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 25 Sep 2014 12:43:35 -0600 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? In-Reply-To: References: Message-ID: When you say "gene-like structures:, are you saying that you are looking for pseudogenes and non-coding genes? You can use the trnascan and snoscan options in the maker_opts.ctl file to find some non-coding RNAS. You may just want to leave off all ab initio gene predictors like SNAP and Augustus as those will be looking for canonical coding genes. If you first hard mask any coding genes, and then provide ESTs or assembled mRNA-seq and proteins, you may be able to use the exonerate alignments produced to identify potential gene like structures. It might require a little post processing of the resulting GFF3 by you. Thanks, Carson From: Anand K S Rao Date: Thursday, September 25, 2014 at 10:18 AM To: Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Sep 29 09:47:00 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 29 Sep 2014 14:47:00 +0000 Subject: [maker-devel] maker failure with example data In-Reply-To: References: Message-ID: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Mon Sep 29 07:33:50 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Mon, 29 Sep 2014 18:03:50 +0530 Subject: [maker-devel] maker failure with example data Message-ID: Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 14:33:18 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 19:33:18 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From eschang1 at gmail.com Tue Sep 30 15:02:30 2014 From: eschang1 at gmail.com (Sally Chang) Date: Tue, 30 Sep 2014 15:02:30 -0500 Subject: [maker-devel] interpreting SNAP gene-stats output Message-ID: Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 15:49:10 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 20:49:10 +0000 Subject: [maker-devel] Maker In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> References: <000001cfdc80$77dc88e0$67959aa0$@uni-bayreuth.de> <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> Message-ID: MAKER can't annotate assembled transcripts. It can only annotate genomic sequence. Transcript annotation is a very different problem. Using a different species' genome would not produce annotation for your transcripts, rather your transcripts would just be considered evidence for annotating that species genome. Your best option is probably just to use BLAST to look for homology between species. Do BLAST both ways and if gene A in species 1 is the best hit for gene B in species 2 and vice versa (reciprocal best hits), then you can consider them as being paralogous. Also use the proteome from the related species when doing the BLAST analysis (not the nucleotide transcripts). --Carson On 9/30/14, 6:51 AM, "Mark Yandell" wrote: > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Co-director USTAR Center for Genetic Discovery >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Alfons Weig [a.weig at uni-bayreuth.de] >Sent: Tuesday, September 30, 2014 1:30 AM >To: Mark Yandell >Subject: Maker > >Hello, > >I have just sent a feedback via the Maker feedback form but received the >following error message: Therefore, I send it vir regular mail: > >Error executing run mode 'feedback': Can't call method "MailMsg" without >a package or object reference at /var/www/cgi-bin/mwas/lib/MWS.pm line >1116. >at /var/www/cgi-bin/mwas/maker.cgi line 21. > >I have just tested the Maker annotation pipeline with short sequences >from an RNAseq de-novo assembly using A. mellifera as areference genome. >Unfortunately, honey bee is not the species we sequence but is closely >related to it. >I was wondering whether this was a good approach? There are no genome >data availabe for our bee species. Is maker able to annotate de.novo >assemble mRNA transcripts obtained by Velvet/Oases (including partial >sequences)? > >Best regards >Alfons Weig > > >Dr. Alfons Weig >DNA-Analytik & ?koinformatik - Univ. Bayreuth - NW1 >Universit?tsstrasse 30 >95447 Bayreuth - Germany >Tel. +49 (0)921-552457 >www.daneco.uni-bayreuth.de > From carsonhh at gmail.com Tue Sep 30 15:59:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 30 Sep 2014 14:59:47 -0600 Subject: [maker-devel] interpreting SNAP gene-stats output In-Reply-To: References: Message-ID: Probably. But it's really not that important of a value because during the 'fathom -genome.ann genome.dna -categorize 1000' step outlined in the SNAP training literature, fathom turns each gene into it's own little contig padded by 1000bp on either size. So in the end the number of starting contigs becomes irrelevant, because they all get trimmed and thrown away anyways. --Carson From: Sally Chang Date: Tuesday, September 30, 2014 at 2:02 PM To: Subject: [maker-devel] interpreting SNAP gene-stats output Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mphoeppner at gmail.com Mon Sep 1 07:07:40 2014 From: mphoeppner at gmail.com (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Mon, 1 Sep 2014 15:07:40 +0200 Subject: [maker-devel] est2genome=1 for est and altest Message-ID: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc From dence at genetics.utah.edu Tue Sep 2 09:32:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 2 Sep 2014 15:32:03 +0000 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: Hi Marc, This is a partial answer to your question. I don't know the full reason that models aren't built from altest evidence, but I do know that those sequences are aligned with tblastx (nucleotide translated to protein and back to nucleotide) and not with blastn with relaxed parameters. Also the final protein and nucleotide alignments that do get made into models are made by exonerate and not by blast. Does that help? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc H?ppner [mphoeppner at gmail.com] Sent: Monday, September 01, 2014 7:07 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] est2genome=1 for est and altest Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Sep 2 10:57:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Sep 2014 10:57:56 -0600 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: There is a reason why no altest2genome option exists in the maker_opts.ctl file. The est2genome and protein2genome options are meant only for generating rough partial models that can be used for training gene finders (should not be used for generating final models). And if you are thinking of using ESTs from another species (altest) to generate initial models for training it's actually an analysis error. This is because altest alignments will be far less accurate than EST or protein alignments (so they will hurt your training). They are slower to generate than EST or protein alignments (by as much as 10-20 fold because they are translated into all 6 reading frames). Also there will be far fewer of them (6 frames of translation make the alignments more spurious; thus they require higher thresholds of significance). So if you are using a species for initial training that is distant enough that it must be aligned as altest via tblastx, then you should have been using proteins instead which will be widely available and more accurately aligned. Note that both proteins and altests are aligned in amino acid space, so you can expect anywhere from several million to hundreds of millions of years of divergence, and the species you use is not expected to be closely related (so whole proteomes will be available from a number of sources that will be far more accurate than any altest alignment). The only real benefit of altest is to provide evidence of lineage specific genes for organisms where there are no species in the same branch or phylum to get protein evidence from. Since there will only be a handful of these genes and they can be obtained in any later bootstrap training steps which will not involve est2genome or protein2genome models. You should use protein2genome models instead for the initial training and only use altest for a any bootstrap training or for your final models. Thanks, Carson On 9/1/14, 7:07 AM, "Marc H?ppner" wrote: >Hi, > >I may be wrong about this, but it seems to me that Maker will never build >a gene model from EST evidence, if the set data is provided as ?altest' >rather than ?est'. In my case, I am annotating a plant for which there is >a closely related reference genome + annotation, as well as pretty good >EST data. So I supplied the EST data as ?altest', assuming that the only >difference would be that the alignment parameters would be slightly more >relaxed. But I found that Maker never made any genome models from that >data. When moving the EST data to ?est?, it worked. > >So I am not sure whether this is an intended behaviour, but in my case it >caught me a bit by surprise? > >Regards, > >Marc >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 05:38:16 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 11:38:16 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:22:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:22:08 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:25:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:25:31 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 11:39:39 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 17:39:39 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: References: Message-ID: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > From dence at genetics.utah.edu Thu Sep 4 11:44:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 4 Sep 2014 17:44:47 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: , <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi Anindyajit, It doesn't look like the error output that you sent to Barry was forwarded with your message. Can you send that again? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Barry Moore [bmoore at genetics.utah.edu] Sent: Thursday, September 04, 2014 11:39 AM To: Anindyajit Banerjee Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fgenesh output to gff3 conversion Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From MEC at stowers.org Thu Sep 4 12:10:14 2014 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 4 Sep 2014 18:10:14 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi, I'm not sure what maker offers in this regard. It's been some time since I've used it now. Anyway, if it helps, some time ago I wrote a quick fgenesh2gff using BioPerl. It is provided here. You need a bioperl installation. http://bio.perl.org/pipermail/bioperl-l/2006-July/022061.html ~Malcolm Cook >-----Original Message----- >From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Barry Moore >Sent: Thursday, September 04, 2014 12:40 PM >To: Anindyajit Banerjee >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Fgenesh output to gff3 conversion > >Hi Anindyajit, > >I'm forwarding you message along to the maker mailing list and devel team... > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the >further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby >attach a test output for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 12:45:15 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 18:45:15 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 12:52:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 12:52:20 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 13:01:57 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 19:01:57 +0000 Subject: [maker-devel] Fwd: Fgenesh output to gff3 conversion References: Message-ID: <77D4D576-9BAC-478D-8A0F-492225D71637@genetics.utah.edu> Attached is the document that Anindyajit set with his original question. B Begin forwarded message: From: Anindyajit Banerjee > Subject: Fgenesh output to gff3 conversion Date: September 4, 2014 at 8:37:26 AM MDT To: > Hi I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh test out put file for your understanding Please help -- Regards, Anindyajit Banerjee Mobile: +919883333000. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh_output_test Type: application/octet-stream Size: 199696 bytes Desc: fgenesh_output_test URL: From carsonhh at gmail.com Thu Sep 4 13:06:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:06:28 -0600 Subject: [maker-devel] Fgenesh output to gff3 conversion Message-ID: MAKER can't convert the existing output, but you could use MAKER to run FGENESH for you instead. The results of which would be in GFF3. --Carson On 9/4/14, 11:39 AM, "Barry Moore" wrote: >Hi Anindyajit, > >I?m forwarding you message along to the maker mailing list and devel team? > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee >wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I >>am trying to convert the fgenesh output to gff3 format for the further >>input in EVM. However I am encountering the error while doing so. Could >>you suggest me any possible way to do so. I hereby attach a test output >>for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 13:24:06 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 19:24:06 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 13:42:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:42:06 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 08:43:19 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 14:43:19 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Hi, I finished running MAKER as suggested above. Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g options. I called the output file maker.gff3 In the maker.gff3 I found some invalid data (does not conform .gff3 format), e.g. ### 2 + ### OR ### .Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Targ et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + ### OR some gene (or mRNA) IDs are not uniq. This means they can be found multiple times with different values within the maker.gff3 How could it happen? As I understood, mRNA IDs in a .gff3 file must be uniq. Thanks Anh-Dao On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker_opts.ctl (Carson Holt) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 18 Jul 2014 11:04:09 -0600 >From: Carson Holt >To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel > Ence >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] Maker_opts.ctl >Message-ID: >Content-Type: text/plain; charset="UTF-8" > >It should just be 'fgenesh'. If it's not there you can still just give >the GFF3. > >--Carson > > >On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>I am not sure which fgenesh executable file should I use. >> >>fgenesh= #location of fgenesh executable >> >>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you need >>to specify a list of other executable programs (such as ppd, ppdn+, etc) >> >>Anh-Dao >> >> >>On 7/16/14 3:32 PM, "Carson Holt" wrote: >> >>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>previous run. Then provide the RepeatModeler file to rmlib= >>> >>>--Carson >>> >>> >>> >>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>By default, model_org=all. Can I use the de novo repeat library >>>>predicted >>>>by RepeatModeler for the rmlib option? >>>> >>>>Anh-Dao >>>> >>>> >>>> >>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>> >>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>rmlib=. >>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>types >>>>>and use soft masking for some and hard masking for others. This >>>>>increases >>>>>sensitivity of evidence alignments while still maintaining >>>>>specificity. >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>I will run Augustus and FGENESH++ inside of MAKER using the parameter >>>>>>files for Augustus. >>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>using >>>>>>two >>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via >>>>>>de >>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>of >>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>database. >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>> >>>>>>>When you ran Augustus separately, it should have created the >>>>>>>parameters >>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>using >>>>>>>the species name you just created. >>>>>>> >>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>than >>>>>>>giving it the results as GFF3. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>Thanks Daniel for your quick response. >>>>>>>> >>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>Augustus. >>>>>>>>I created the parameter file for the genome following their >>>>>>>>instructions. >>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>be >>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>Augustus >>>>>>>>has not trained that genome (no parameter file exists). Otherwise I >>>>>>>>would >>>>>>>>run Augustus inside MAKER. >>>>>>>> >>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>(sure >>>>>>>>I >>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>.gff3 >>>>>>>>files, separated by comma? >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>> >>>>>>>>>Hi Anh-Dao, >>>>>>>>> >>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>command >>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>files >>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>> >>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>options >>>>>>>>>in >>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>the >>>>>>>>>?pred_gff? option. >>>>>>>>> >>>>>>>>>If you have the RepeatMasker output in gff3 format you can give it >>>>>>>>>to >>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>> >>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>to >>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>gives >>>>>>>>>fasta >>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>the >>>>>>>>>other est fasta files. >>>>>>>>> >>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>can >>>>>>>>>use >>>>>>>>>another organism that augustus is already trained for. The list of >>>>>>>>>species that augustus has parameter files for is in the README.txt >>>>>>>>>that >>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>inside >>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>ext-based >>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>predictive ability. >>>>>>>>> >>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>another >>>>>>>>>organism?s parameter file? >>>>>>>>> >>>>>>>>>Thanks, >>>>>>>>>Daniel >>>>>>>>> >>>>>>>>> >>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>following >>>>>>>>>>data: >>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>options) >>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>> - proteins (fasta) >>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert >>>>>>>>>>to >>>>>>>>>>ZFF >>>>>>>>>>format, etc. ) >>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>convert >>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>for >>>>>>>>>>Augustus. >>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>> >>>>>>>>>> Could you please let me know how can I specify parameters in the >>>>>>>>>>maker_opts.ctl file? >>>>>>>>>> Or do you have other suggestions to re-do the data listed above? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Anh-Dao >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>maker-devel mailing list >>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab >>>>>>>>. >>>>>>>>o >>>>>>>>r >>>>>>>>g >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 74, Issue 17 >******************************************* From carsonhh at gmail.com Fri Sep 5 09:37:02 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:37:02 -0600 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 Message-ID: The partial lines are symptoms of writing data to a slow NFS mounted drive. If NFS can't get a response for a write operation, it returns success (even though it wasn't really successful) and then continues to wait for the operation to really complete. This is called asynchronous writing. It improves performance by optimistically returning success on all operations rather than waiting to see if the operation really succeeded. If you have a slow or overloaded NFS mount though, you can get a number a failures and never any indication that they failed except for the fact that some files are missing content or lines are partial. When this happens, you need to run MAKER with the -a flag on fewer CPUs to rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can find which contigs have partial GFF3 lines, you can delete just those along with the datastore index log file and then launch maker without any flags to let it recompute just those contigs. Another possible cause is also NFS related. If you are running MAKER multiple times in the same working directory, and a slow NFS mount doesn't allow maker to properly lock files, then two maker jobs can try and compute the same contig simultaneously. Simultaneous writing of files can then cause IDs to be duplicated and some lines to be munged as lines from one process arrive to the file in the middle of lines from another process (creating a jumble of characters and partial lines). Start a singe maker job on fewer cpus using the -a flag to rebuild the GFF3 files if this is the case. Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are passing in GFF3 files with already assigned IDS (that may be used elsewhere). Are you using GFF3 pass-trough? Features that will not have unique ID= tags are CDS, three_prime_utr, and five_prime_utr features (these are considered non-continuous features because of the shared ID across lines). You can see examples here --> http://www.sequenceontology.org/gff3.shtml Also Name= attributes are not required to be unique. Thanks, Carson On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >Hi, > >I finished running MAKER as suggested above. >Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >options. I called the output file maker.gff3 > >In the maker.gff3 I found some invalid data (does not conform .gff3 >format), e.g. > >### >2 + >### > >OR > >### >.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Tar >g >et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >### > >OR some gene (or mRNA) IDs are not uniq. This means they can be found >multiple times with different values within the maker.gff3 > >How could it happen? As I understood, mRNA IDs in a .gff3 file must be >uniq. > >Thanks >Anh-Dao > > > > > >On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" > wrote: > >>Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >>To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >>or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >>You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >>When replying, please edit your Subject line so it is more specific >>than "Re: Contents of maker-devel digest..." >> >> >>Today's Topics: >> >> 1. Re: Maker_opts.ctl (Carson Holt) >> >> >>---------------------------------------------------------------------- >> >>Message: 1 >>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>From: Carson Holt >>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >> Ence >>Cc: "maker-devel at yandell-lab.org" >>Subject: Re: [maker-devel] Maker_opts.ctl >>Message-ID: >>Content-Type: text/plain; charset="UTF-8" >> >>It should just be 'fgenesh'. If it's not there you can still just give >>the GFF3. >> >>--Carson >> >> >>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>I am not sure which fgenesh executable file should I use. >>> >>>fgenesh= #location of fgenesh executable >>> >>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>need >>>to specify a list of other executable programs (such as ppd, ppdn+, etc) >>> >>>Anh-Dao >>> >>> >>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>> >>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>previous run. Then provide the RepeatModeler file to rmlib= >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>predicted >>>>>by RepeatModeler for the rmlib option? >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>> >>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>> >>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>rmlib=. >>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>types >>>>>>and use soft masking for some and hard masking for others. This >>>>>>increases >>>>>>sensitivity of evidence alignments while still maintaining >>>>>>specificity. >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>parameter >>>>>>>files for Augustus. >>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>using >>>>>>>two >>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>via >>>>>>>de >>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>of >>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>database. >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>parameters >>>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>>using >>>>>>>>the species name you just created. >>>>>>>> >>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>than >>>>>>>>giving it the results as GFF3. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>> >>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>Augustus. >>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>instructions. >>>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>>be >>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>Augustus >>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>I >>>>>>>>>would >>>>>>>>>run Augustus inside MAKER. >>>>>>>>> >>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>(sure >>>>>>>>>I >>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>.gff3 >>>>>>>>>files, separated by comma? >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>>> >>>>>>>>>>Hi Anh-Dao, >>>>>>>>>> >>>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>command >>>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>>files >>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>> >>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>options >>>>>>>>>>in >>>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>>the >>>>>>>>>>?pred_gff? option. >>>>>>>>>> >>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>it >>>>>>>>>>to >>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>> >>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>>to >>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>gives >>>>>>>>>>fasta >>>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>>the >>>>>>>>>>other est fasta files. >>>>>>>>>> >>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>can >>>>>>>>>>use >>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>of >>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>README.txt >>>>>>>>>>that >>>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>>inside >>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>ext-based >>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>predictive ability. >>>>>>>>>> >>>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>>another >>>>>>>>>>organism?s parameter file? >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>following >>>>>>>>>>>data: >>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>options) >>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>> - proteins (fasta) >>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>convert >>>>>>>>>>>to >>>>>>>>>>>ZFF >>>>>>>>>>>format, etc. ) >>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>convert >>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>>for >>>>>>>>>>>Augustus. >>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>> >>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>the >>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>above? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Anh-Dao >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>maker-devel mailing list >>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b >>>>>>>>>. >>>>>>>>>o >>>>>>>>>r >>>>>>>>>g >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> >> >>------------------------------ >> >>Subject: Digest Footer >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >>------------------------------ >> >>End of maker-devel Digest, Vol 74, Issue 17 >>******************************************* > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Fri Sep 5 01:58:59 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Fri, 5 Sep 2014 07:58:59 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 5 09:17:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:17:45 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I'm glad the work around is working for you. Proc::ProcessTable being faster than 'ps' is actually very very atypical. It is likely there is an issue with your system which is suggested by the fact 'ps' is hanging and accumulating processes which is also very atypical (ps should return in a fraction of a second). We actually switched from Proc::ProcessTable to 'ps' some time ago because 'ps' is several fold faster, and Proc::ProcessTable won't compile on about 10-15% of system architectures. Thanks, Carson From: "Timothy Stitt (TGAC)" Date: Friday, September 5, 2014 at 1:58 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 10:08:50 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 16:08:50 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Thanks Carson. I ran MAKER on 30 CPUs. I will re-run it using 10 CPUs. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > I submitted est_gff=cufflinks.gff3 and pred_gff=fgenesh.gff3 when running MAKER. However, I got 4 repeated mRNA ids as follows: augustus_masked-c206700011.Contig3-processed-gene-0.3 augustus_masked-c206700011.Contig3-processed-gene-0.3-mRNA-1 snap_masked-c206500027.Contig3-processed-gene-0.26 snap_masked-c206500027.Contig3-processed-gene-0.26-mRNA-1 Anh-Dao From Brian.Mack at ARS.USDA.GOV Mon Sep 8 07:47:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 8 Sep 2014 13:47:01 +0000 Subject: [maker-devel] non-overlapping predictions Message-ID: Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set-td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Sep 9 11:14:09 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 9 Sep 2014 17:14:09 +0000 Subject: [maker-devel] Non-canonical splice junctions Message-ID: <1410282848765.20893@uga.edu> Hi, Is it possible to force MAKER to predict gene models only with canonical splice sites? Or is a quick way to identify gene models with non-canonical splice sites? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 9 16:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 09 Sep 2014 16:09:13 -0600 Subject: [maker-devel] non-overlapping predictions Message-ID: It's a naming issue. The reference match/match_part features have 'abinit' in the name while the non-overlapping fasta file has 'processed' in the name of the fasta header. The easiest way to fix it is to just replaced 'processed' with 'abinit' in the terms you are searching for. This was supposed to be resolved already, but I'll see what's going on. What version of MAKER are you using? --Carson From: "Mack, Brian" Date: Monday, September 8, 2014 at 7:47 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] non-overlapping predictions Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set -td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Thu Sep 18 05:49:45 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Thu, 18 Sep 2014 11:49:45 +0000 Subject: [maker-devel] CPUs problems Message-ID: I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I checked the log file and got these errors: Processing run.log file... examining contents of the fasta file and run log shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory Can you let me know how can I fix the problem? Thanks Anh-Dao On 9/5/14 11:37 AM, "Carson Holt" wrote: >The partial lines are symptoms of writing data to a slow NFS mounted >drive. If NFS can't get a response for a write operation, it returns >success (even though it wasn't really successful) and then continues to >wait for the operation to really complete. This is called asynchronous >writing. It improves performance by optimistically returning success on >all operations rather than waiting to see if the operation really >succeeded. If you have a slow or overloaded NFS mount though, you can get >a number a failures and never any indication that they failed except for >the fact that some files are missing content or lines are partial. > >When this happens, you need to run MAKER with the -a flag on fewer CPUs to >rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >find which contigs have partial GFF3 lines, you can delete just those >along with the datastore index log file and then launch maker without any >flags to let it recompute just those contigs. > >Another possible cause is also NFS related. If you are running MAKER >multiple times in the same working directory, and a slow NFS mount doesn't >allow maker to properly lock files, then two maker jobs can try and >compute the same contig simultaneously. Simultaneous writing of files can >then cause IDs to be duplicated and some lines to be munged as lines from >one process arrive to the file in the middle of lines from another process >(creating a jumble of characters and partial lines). Start a singe maker >job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >the case. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > >Features that will not have unique ID= tags are CDS, three_prime_utr, and >five_prime_utr features (these are considered non-continuous features >because of the shared ID across lines). >You can see examples here --> http://www.sequenceontology.org/gff3.shtml > >Also Name= attributes are not required to be unique. > >Thanks, >Carson > > > > > > >On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>Hi, >> >>I finished running MAKER as suggested above. >>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>options. I called the output file maker.gff3 >> >>In the maker.gff3 I found some invalid data (does not conform .gff3 >>format), e.g. >> >>### >>2 + >>### >> >>OR >> >>### >>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Ta >>r >>g >>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>### >> >>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>multiple times with different values within the maker.gff3 >> >>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>uniq. >> >>Thanks >>Anh-Dao >> >> >> >> >> >>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >> wrote: >> >>>Send maker-devel mailing list submissions to >>> maker-devel at yandell-lab.org >>> >>>To subscribe or unsubscribe via the World Wide Web, visit >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>>or, via email, send a message with subject or body 'help' to >>> maker-devel-request at yandell-lab.org >>> >>>You can reach the person managing the list at >>> maker-devel-owner at yandell-lab.org >>> >>>When replying, please edit your Subject line so it is more specific >>>than "Re: Contents of maker-devel digest..." >>> >>> >>>Today's Topics: >>> >>> 1. Re: Maker_opts.ctl (Carson Holt) >>> >>> >>>---------------------------------------------------------------------- >>> >>>Message: 1 >>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>From: Carson Holt >>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>> Ence >>>Cc: "maker-devel at yandell-lab.org" >>>Subject: Re: [maker-devel] Maker_opts.ctl >>>Message-ID: >>>Content-Type: text/plain; charset="UTF-8" >>> >>>It should just be 'fgenesh'. If it's not there you can still just give >>>the GFF3. >>> >>>--Carson >>> >>> >>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>I am not sure which fgenesh executable file should I use. >>>> >>>>fgenesh= #location of fgenesh executable >>>> >>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>need >>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>etc) >>>> >>>>Anh-Dao >>>> >>>> >>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>> >>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>predicted >>>>>>by RepeatModeler for the rmlib option? >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>> >>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>rmlib=. >>>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>>types >>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>increases >>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>specificity. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>parameter >>>>>>>>files for Augustus. >>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>using >>>>>>>>two >>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>via >>>>>>>>de >>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>>of >>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>database. >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>> >>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>parameters >>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>MAKER >>>>>>>>>using >>>>>>>>>the species name you just created. >>>>>>>>> >>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>than >>>>>>>>>giving it the results as GFF3. >>>>>>>>> >>>>>>>>>--Carson >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>> >>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>Augustus. >>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>instructions. >>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>gene >>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>will >>>>>>>>>>be >>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>Augustus >>>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>>I >>>>>>>>>>would >>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>> >>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>>(sure >>>>>>>>>>I >>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>.gff3 >>>>>>>>>>files, separated by comma? >>>>>>>>>> >>>>>>>>>>Anh-Dao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>> >>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>protein >>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>command >>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>protein >>>>>>>>>>>files >>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>> >>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>options >>>>>>>>>>>in >>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>in >>>>>>>>>>>the >>>>>>>>>>>?pred_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>gives >>>>>>>>>>>fasta >>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>all >>>>>>>>>>>the >>>>>>>>>>>other est fasta files. >>>>>>>>>>> >>>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>>can >>>>>>>>>>>use >>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>of >>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>README.txt >>>>>>>>>>>that >>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>from >>>>>>>>>>>inside >>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>ext-based >>>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>>predictive ability. >>>>>>>>>>> >>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>use >>>>>>>>>>>another >>>>>>>>>>>organism?s parameter file? >>>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>following >>>>>>>>>>>>data: >>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>options) >>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>convert >>>>>>>>>>>>to >>>>>>>>>>>>ZFF >>>>>>>>>>>>format, etc. ) >>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>convert >>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>trained >>>>>>>>>>>>for >>>>>>>>>>>>Augustus. >>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>> >>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>the >>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>above? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Anh-Dao >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> >>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell >>>>>>>>>>>>- >>>>>>>>>>>>l >>>>>>>>>>>>a >>>>>>>>>>>>b >>>>>>>>>>>>. >>>>>>>>>>>>o >>>>>>>>>>>>r >>>>>>>>>>>>g >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>maker-devel mailing list >>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>>------------------------------ >>> >>>Subject: Digest Footer >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>>------------------------------ >>> >>>End of maker-devel Digest, Vol 74, Issue 17 >>>******************************************* >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri Sep 19 11:22:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Sep 2014 11:22:50 -0600 Subject: [maker-devel] CPUs problems In-Reply-To: References: Message-ID: These are further symptoms of an IO related issue. The script cannot even query it's current working directory. Check to make sure there is plenty of space in the temporary directory /tmp. If /tmp is separately mounted on each machine there may be one that is full. Also make sure you did not set TMP= in the maker_opts.ctl file to an NFS mounted location. Do you by any chance get any warnings when you start MAKER. For example --> WARNING: Multiple MAKER processes have been started in the same directory. That would indicate that the MPI communication rung is down which would drastically increase IO operations. You may also have one or more nodes that are having the issue and are the source of all the errors. If you are using OpenMPI to run MAKER, you can tag the output from each node using the --tag-output flag for mpiexec. Then if the same node is always producing the error, you can have IT look at it. Also MAKER is set to automatically retry on errors. If all contigs are finished, check the output. Make sure there are the same number of genes in the fasta files and GFF3 files. Also look for munged content. If everything looks ok, MAKER may have gotten around the issue through shear brute force (I.e. it retried until it succeeded). --Carson On 9/18/14, 5:49 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I >checked the log file and got these errors: > >Processing run.log file... >examining contents of the fasta file and run log >shell-init: error retrieving current directory: getcwd: cannot access >parent directories: No such file or directory > > >Can you let me know how can I fix the problem? > >Thanks >Anh-Dao > > >On 9/5/14 11:37 AM, "Carson Holt" wrote: > >>The partial lines are symptoms of writing data to a slow NFS mounted >>drive. If NFS can't get a response for a write operation, it returns >>success (even though it wasn't really successful) and then continues to >>wait for the operation to really complete. This is called asynchronous >>writing. It improves performance by optimistically returning success on >>all operations rather than waiting to see if the operation really >>succeeded. If you have a slow or overloaded NFS mount though, you can get >>a number a failures and never any indication that they failed except for >>the fact that some files are missing content or lines are partial. >> >>When this happens, you need to run MAKER with the -a flag on fewer CPUs >>to >>rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >>find which contigs have partial GFF3 lines, you can delete just those >>along with the datastore index log file and then launch maker without any >>flags to let it recompute just those contigs. >> >>Another possible cause is also NFS related. If you are running MAKER >>multiple times in the same working directory, and a slow NFS mount >>doesn't >>allow maker to properly lock files, then two maker jobs can try and >>compute the same contig simultaneously. Simultaneous writing of files >>can >>then cause IDs to be duplicated and some lines to be munged as lines from >>one process arrive to the file in the middle of lines from another >>process >>(creating a jumble of characters and partial lines). Start a singe maker >>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >>the case. >> >>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you >>are >>passing in GFF3 files with already assigned IDS (that may be used >>elsewhere). Are you using GFF3 pass-trough? >> >>Features that will not have unique ID= tags are CDS, three_prime_utr, and >>five_prime_utr features (these are considered non-continuous features >>because of the shared ID across lines). >>You can see examples here --> http://www.sequenceontology.org/gff3.shtml >> >>Also Name= attributes are not required to be unique. >> >>Thanks, >>Carson >> >> >> >> >> >> >>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>Hi, >>> >>>I finished running MAKER as suggested above. >>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>>options. I called the output file maker.gff3 >>> >>>In the maker.gff3 I found some invalid data (does not conform .gff3 >>>format), e.g. >>> >>>### >>>2 + >>>### >>> >>>OR >>> >>>### >>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;T >>>a >>>r >>>g >>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>>### >>> >>>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>>multiple times with different values within the maker.gff3 >>> >>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>>uniq. >>> >>>Thanks >>>Anh-Dao >>> >>> >>> >>> >>> >>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >>> wrote: >>> >>>>Send maker-devel mailing list submissions to >>>> maker-devel at yandell-lab.org >>>> >>>>To subscribe or unsubscribe via the World Wide Web, visit >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>g >>>> >>>>or, via email, send a message with subject or body 'help' to >>>> maker-devel-request at yandell-lab.org >>>> >>>>You can reach the person managing the list at >>>> maker-devel-owner at yandell-lab.org >>>> >>>>When replying, please edit your Subject line so it is more specific >>>>than "Re: Contents of maker-devel digest..." >>>> >>>> >>>>Today's Topics: >>>> >>>> 1. Re: Maker_opts.ctl (Carson Holt) >>>> >>>> >>>>---------------------------------------------------------------------- >>>> >>>>Message: 1 >>>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>>From: Carson Holt >>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>>> Ence >>>>Cc: "maker-devel at yandell-lab.org" >>>>Subject: Re: [maker-devel] Maker_opts.ctl >>>>Message-ID: >>>>Content-Type: text/plain; charset="UTF-8" >>>> >>>>It should just be 'fgenesh'. If it's not there you can still just give >>>>the GFF3. >>>> >>>>--Carson >>>> >>>> >>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>I am not sure which fgenesh executable file should I use. >>>>> >>>>>fgenesh= #location of fgenesh executable >>>>> >>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>>need >>>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>>etc) >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>>> >>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like >>>>>>your >>>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>>predicted >>>>>>>by RepeatModeler for the rmlib option? >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>>rmlib=. >>>>>>>> By letting MAKER handle repeat masking it will differentiate >>>>>>>>repeat >>>>>>>>types >>>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>>increases >>>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>>specificity. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>>parameter >>>>>>>>>files for Augustus. >>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>>using >>>>>>>>>two >>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>>via >>>>>>>>>de >>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM >>>>>>>>>inside >>>>>>>>>of >>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>>database. >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>>> >>>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>>parameters >>>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>>MAKER >>>>>>>>>>using >>>>>>>>>>the species name you just created. >>>>>>>>>> >>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>>than >>>>>>>>>>giving it the results as GFF3. >>>>>>>>>> >>>>>>>>>>--Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>>> >>>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>>Augustus. >>>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>>instructions. >>>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>>gene >>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>>will >>>>>>>>>>>be >>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>>Augustus >>>>>>>>>>>has not trained that genome (no parameter file exists). >>>>>>>>>>>Otherwise >>>>>>>>>>>I >>>>>>>>>>>would >>>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>>> >>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker >>>>>>>>>>>output >>>>>>>>>>>(sure >>>>>>>>>>>I >>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>>.gff3 >>>>>>>>>>>files, separated by comma? >>>>>>>>>>> >>>>>>>>>>>Anh-Dao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>>> >>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>>protein >>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>>command >>>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>>protein >>>>>>>>>>>>files >>>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>>> >>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>>options >>>>>>>>>>>>in >>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>>in >>>>>>>>>>>>the >>>>>>>>>>>>?pred_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>>gives >>>>>>>>>>>>fasta >>>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>>all >>>>>>>>>>>>the >>>>>>>>>>>>other est fasta files. >>>>>>>>>>>> >>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then >>>>>>>>>>>>you >>>>>>>>>>>>can >>>>>>>>>>>>use >>>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>>of >>>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>>README.txt >>>>>>>>>>>>that >>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>>from >>>>>>>>>>>>inside >>>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>>ext-based >>>>>>>>>>>>hints to augustus at runtime, which can really improve >>>>>>>>>>>>Augustus? >>>>>>>>>>>>predictive ability. >>>>>>>>>>>> >>>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>>use >>>>>>>>>>>>another >>>>>>>>>>>>organism?s parameter file? >>>>>>>>>>>> >>>>>>>>>>>>Thanks, >>>>>>>>>>>>Daniel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>>following >>>>>>>>>>>>>data: >>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>>options) >>>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>>convert >>>>>>>>>>>>>to >>>>>>>>>>>>>ZFF >>>>>>>>>>>>>format, etc. ) >>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>>convert >>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>>trained >>>>>>>>>>>>>for >>>>>>>>>>>>>Augustus. >>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>>the >>>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>>above? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Anh-Dao >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> >>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandel >>>>>>>>>>>>>l >>>>>>>>>>>>>- >>>>>>>>>>>>>l >>>>>>>>>>>>>a >>>>>>>>>>>>>b >>>>>>>>>>>>>. >>>>>>>>>>>>>o >>>>>>>>>>>>>r >>>>>>>>>>>>>g >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>maker-devel mailing list >>>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>>------------------------------ >>>> >>>>Subject: Digest Footer >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>>------------------------------ >>>> >>>>End of maker-devel Digest, Vol 74, Issue 17 >>>>******************************************* >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carson.holt at genetics.utah.edu Mon Sep 22 14:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 22 Sep 2014 20:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <541BCE0A.70806@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> Message-ID: The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > From myandell at genetics.utah.edu Mon Sep 22 18:10:38 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 23 Sep 2014 00:10:38 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: References: <541BCE0A.70806@env.ethz.ch>, Message-ID: <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> Also are you numbers including the ab-inito predictions without evidence that have pfamm domains? cheers, --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carson.holt at genetics.utah.edu] Sent: Monday, September 22, 2014 2:17 PM To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From aksrao at ucdavis.edu Thu Sep 25 10:18:30 2014 From: aksrao at ucdavis.edu (Anand K S Rao) Date: Thu, 25 Sep 2014 09:18:30 -0700 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Message-ID: Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Sep 25 12:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 25 Sep 2014 18:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <5421695F.5040409@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> <5421695F.5040409@env.ethz.ch> Message-ID: Sorry for the slow reply. I was trying to locate a script that might be useful for you. I think a species specific repeat libary will be of most benefit here (it's surprising just how influential this step is). Also note that you should train SNAP and Augustus on your species and are not just using another related species as a stand in. With respect to PFAM domains, on some organisms you may not get a lot of cross species protein alignments because of divergence or assembly issues. This of course makes it harder to support these models with direct protein alignments. However you can run InterProscan over the non-overlapping.proteins.fasta file produced by MAKER (contains non-redundant rejected models). Because an HMM is used for domain identification, it can pick up protein domains that would not produce a significant BLAST alignment because of divergence. You can then add models with positive hits for protein domains back into your gene set. This ad hoc procedure usually can only increase gene counts by about 10% though for organisms where it's required. I've attached a script that makes generating results for these genes easier. 1. First you run InterProScan with just PFAM. 2. Then you take the IDs of all models that have a domain in the report and create a list (1 ID per line). 3. Next use the fasta_tool script that comes with MAKER together with the --select flag to separate just the positive hits (ID's in your list) from the non-overlapping.proteins.fasta and non-overlapping.transscripts.fasta files. 4. Use the attached script to separate just the positive hits (your ID list) from the GFF3. The script will upgrade match/match_part results to gene/mRNA/exon/CDS results and print them out for you. 5. Use the fasta_maerge and gff3_merge scripts that come with MAKER to merge the selected/upgraded GFF3 entries and selected FASTA entries back into the original MAKER results. --Carson On 9/23/14, 6:36 AM, "Stefan Zoller" wrote: >Please forgive my ignorance, I am not entirely sure if I understand your >question correctly, but I will try to answer. >As evidence we use: >1) our own transcriptome (trinity assembled RNAseq, filtering out the >very low expression transcripts). >2) all swissprot plant proteins, and several protein sets from closely >related plant species downloaded from NCBI. >I am not sure if the ab-initio predictions without evidence have pfamm >domains. Honestly, I would not know how to tell and how to >include/exclude. >I was assuming that we should not have too many Maker approved >predictions without evidence anyway, because we use "keeps_preds=0". >The numbers of gene predictions I mentioned in my email are the >predictions reported by the fasta_merge/gff3_merge scripts in the >"*maker.proteins.fasta". There are of course many more predictions in >e.g., "*maker.augustus_masked.proteins.fasta" (about 68'000 in this file). > >I hope I am not totally off with my answer. >Cheers, Stefan > > > >On 23.09.14 02:10, Mark Yandell wrote: >> Also are you numbers including the ab-inito predictions without >>evidence that have pfamm domains? >> >> cheers, >> >> >> --mark >> >> >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Co-director USTAR Center for Genetic Discovery >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Carson Holt [carson.holt at genetics.utah.edu] >> Sent: Monday, September 22, 2014 2:17 PM >> To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. >>scaffolded genome >> >> The contiged assembly is more likely to give spurious hits and >>alignments. >> They also can be harder to repeat mask. Also gene predictors can >>behave >> slightly different on small sequences than on longer ones. If you have >> fewer gene models than you expect, your first step should be to process >> the scaffolds with CEGMA. It will give you an estimate of the genomes >> "completeness". If CEGMA gives a 60% completeness value for example >>then >> you can expect to only recover 60% of the expected number of genes. Next >> you should run RepeatModeler of similar software to help generate a >> species specific repeat library. Under masked repeats can make >>predicting >> genes on longer scaffolds far more difficult for ab initio predictors. >> >> --Carson >> >> >> On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >> >>> Hi, >>> >>> I am working on the annotation of a plant genome (about 600MB) and we >>> have a reasonable draft assembly, a fairly good transcriptome and quite >>> a few proteins from related species. We have also extensively trained >>> augustus and are also feeding genmark and snap predictions. >>> >>> Recently I noticed a behavior of Maker that seems fairly odd and which >>>I >>> cannot explain at all. When I take the scaffolded genome (about 23000 >>> scaffolds) I get roughly 9'000 maker approved gene models. Which is >>> admittedly a bit on the low side and we have to work on this. However, >>> when I break up the scaffolds into contigs at stretches of N longer >>> 500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >>> obviously 17'000 is more in the range what I would expect, so I am >>> inclined to go with these. I have looked at both annotations and the >>> evidence in WebApollo and the evidence alignments are identical for >>>both >>> runs. The approved genes seem to be the same, except for the additional >>> ones in the "contiged" genome version. The additional gene models are >>> not necessarily at the ends of the contigs, so I think it has nothing >>>to >>> do with having the stretches of Ns nearby in the scaffolded genome. Do >>> you have any idea why maker comes up with the additional numbers of >>>gene >>> models and how I could "convince" maker to give me the same gene models >>> for the scaffolded assembly? >>> >>> Cheers, >>> Stefan >>> >>> >>> >>> -- >>> Stefan Zoller, PhD >>> Bioinformatics >>> Genetic Diversity Centre >>> ETH Zurich CHN E55.1 >>> Universit?tsstrasse 16 >>> 8092 Zurich >>> Switzerland >>> >>> Phone: +41 44 632 66 85 >>> E-Mail: stefan.zoller at env.ethz.ch >>> Web: www.gdc.ethz.ch >>> >>> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 5523 bytes Desc: gff3_preds2models URL: From carsonhh at gmail.com Thu Sep 25 12:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 25 Sep 2014 12:43:35 -0600 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? In-Reply-To: References: Message-ID: When you say "gene-like structures:, are you saying that you are looking for pseudogenes and non-coding genes? You can use the trnascan and snoscan options in the maker_opts.ctl file to find some non-coding RNAS. You may just want to leave off all ab initio gene predictors like SNAP and Augustus as those will be looking for canonical coding genes. If you first hard mask any coding genes, and then provide ESTs or assembled mRNA-seq and proteins, you may be able to use the exonerate alignments produced to identify potential gene like structures. It might require a little post processing of the resulting GFF3 by you. Thanks, Carson From: Anand K S Rao Date: Thursday, September 25, 2014 at 10:18 AM To: Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Sep 29 08:47:00 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 29 Sep 2014 14:47:00 +0000 Subject: [maker-devel] maker failure with example data In-Reply-To: References: Message-ID: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Mon Sep 29 06:33:50 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Mon, 29 Sep 2014 18:03:50 +0530 Subject: [maker-devel] maker failure with example data Message-ID: Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 13:33:18 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 19:33:18 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From eschang1 at gmail.com Tue Sep 30 14:02:30 2014 From: eschang1 at gmail.com (Sally Chang) Date: Tue, 30 Sep 2014 15:02:30 -0500 Subject: [maker-devel] interpreting SNAP gene-stats output Message-ID: Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 14:49:10 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 20:49:10 +0000 Subject: [maker-devel] Maker In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> References: <000001cfdc80$77dc88e0$67959aa0$@uni-bayreuth.de> <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> Message-ID: MAKER can't annotate assembled transcripts. It can only annotate genomic sequence. Transcript annotation is a very different problem. Using a different species' genome would not produce annotation for your transcripts, rather your transcripts would just be considered evidence for annotating that species genome. Your best option is probably just to use BLAST to look for homology between species. Do BLAST both ways and if gene A in species 1 is the best hit for gene B in species 2 and vice versa (reciprocal best hits), then you can consider them as being paralogous. Also use the proteome from the related species when doing the BLAST analysis (not the nucleotide transcripts). --Carson On 9/30/14, 6:51 AM, "Mark Yandell" wrote: > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Co-director USTAR Center for Genetic Discovery >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Alfons Weig [a.weig at uni-bayreuth.de] >Sent: Tuesday, September 30, 2014 1:30 AM >To: Mark Yandell >Subject: Maker > >Hello, > >I have just sent a feedback via the Maker feedback form but received the >following error message: Therefore, I send it vir regular mail: > >Error executing run mode 'feedback': Can't call method "MailMsg" without >a package or object reference at /var/www/cgi-bin/mwas/lib/MWS.pm line >1116. >at /var/www/cgi-bin/mwas/maker.cgi line 21. > >I have just tested the Maker annotation pipeline with short sequences >from an RNAseq de-novo assembly using A. mellifera as areference genome. >Unfortunately, honey bee is not the species we sequence but is closely >related to it. >I was wondering whether this was a good approach? There are no genome >data availabe for our bee species. Is maker able to annotate de.novo >assemble mRNA transcripts obtained by Velvet/Oases (including partial >sequences)? > >Best regards >Alfons Weig > > >Dr. Alfons Weig >DNA-Analytik & ?koinformatik - Univ. Bayreuth - NW1 >Universit?tsstrasse 30 >95447 Bayreuth - Germany >Tel. +49 (0)921-552457 >www.daneco.uni-bayreuth.de > From carsonhh at gmail.com Tue Sep 30 14:59:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 30 Sep 2014 14:59:47 -0600 Subject: [maker-devel] interpreting SNAP gene-stats output In-Reply-To: References: Message-ID: Probably. But it's really not that important of a value because during the 'fathom -genome.ann genome.dna -categorize 1000' step outlined in the SNAP training literature, fathom turns each gene into it's own little contig padded by 1000bp on either size. So in the end the number of starting contigs becomes irrelevant, because they all get trimmed and thrown away anyways. --Carson From: Sally Chang Date: Tuesday, September 30, 2014 at 2:02 PM To: Subject: [maker-devel] interpreting SNAP gene-stats output Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Tue Sep 30 23:39:21 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 1 Oct 2014 05:39:21 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Another possibility could be that MPICH2 wasn?t build properly, no? I remember something with enabling shared libraries during the compilation of mpich, without which the error below would appear. /Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se On 30 Sep 2014, at 21:33, Carson Holt > wrote: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Tue Sep 30 23:58:16 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Wed, 1 Oct 2014 11:28:16 +0530 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Dear All, Thank you. I figured out th problem is with mpich2. I was behind mpich2 but was unsuccessful. I installed mpich v3 and its working fine now. Thank you all. The old GMDO tutorials are bit misleading as the new versions have come up. On Wed, Oct 1, 2014 at 11:09 AM, Marc H?ppner wrote: > Another possibility could be that MPICH2 wasn?t build properly, no? I > remember something with enabling shared libraries during the compilation of > mpich, without which the error below would appear. > > /Marc > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > On 30 Sep 2014, at 21:33, Carson Holt > wrote: > > The message is warning that there are multiple instances of MAKER > running, but no MPI communication. When you build MAKER (perl Build.PL step > when installing MAKER), you need to specify the location of 'mpicc' and > 'mpi.h' to build with MPI support. Otherwise you won't be able to link > against MPICH2 shared libraries. You probably need to rerun that step. > > --Carson > > > From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data > > Hi Carson, > > I figured out the problem is with RepeatMasker installation and I fixed > it. > > I am running maker with MPICH2 and I get the following warning when I > start it: > > > > *STATUS: Processing and indexing input FASTA files... WARNING: Multiple > MAKER processes have been started in the same directory.* > > I would like to if this is common. > > Regards, > Goutham > > > On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: > >> Dear Carson, >> >> Thank you for the reply. I reinstalled the BioPerl and now I am getting >> the following error on test data. >> >> ERROR: RepeatMasker failed >> --> rank=NA, hostname=motif >> ERROR: Failed while doing repeat masking >> ERROR: Chunk failed at level:0, tier_type:1 >> FAILED CONTIG:contig-dpp-500-500 >> >> On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt < >> carson.holt at genetics.utah.edu> wrote: >> >>> The error is caused by the BioPerl indexer returning an empty length >>> for the indexed fasta sequence (possibly because of a corrupt index file or >>> other reasons). You may need to reinstall BioPerl (use the CPAN version >>> not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl >>> indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface >>> to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb >>> directory for the MAKER run before retrying. >>> >>> Also verify that the /tmp directory on your system or the directory >>> pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is >>> not set to an NFS mounted location. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> >>> From: Goutham atla >>> Date: Monday, September 29, 2014 at 6:33 AM >>> To: >>> Subject: maker failure with example data >>> >>> Dear All, >>> >>> I am running maker with the demo file, i.e dip_contig.fasta by keeping >>> all other parameters in .ctl files as default. But it do not progress and >>> shows the following message that the length of the sequence is 0. Can >>> anybody help me ? >>> >>> >>> >>> --Next Contig-- >>> >>> MAKER WARNING: All old files will be erased before continuing >>> #--------------------------------------------------------------------- >>> Skipping the contig because it is too short!! >>> SeqID: contig-dpp-500-500 >>> Length: 0 >>> #--------------------------------------------------------------------- >>> >>> >>> Regards, >>> Goutham >>> >> >> >> >> -- >> Goutham Atla >> > > > > -- > Goutham Atla > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From mphoeppner at gmail.com Mon Sep 1 07:07:40 2014 From: mphoeppner at gmail.com (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Mon, 1 Sep 2014 15:07:40 +0200 Subject: [maker-devel] est2genome=1 for est and altest Message-ID: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc From dence at genetics.utah.edu Tue Sep 2 09:32:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 2 Sep 2014 15:32:03 +0000 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: Hi Marc, This is a partial answer to your question. I don't know the full reason that models aren't built from altest evidence, but I do know that those sequences are aligned with tblastx (nucleotide translated to protein and back to nucleotide) and not with blastn with relaxed parameters. Also the final protein and nucleotide alignments that do get made into models are made by exonerate and not by blast. Does that help? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc H?ppner [mphoeppner at gmail.com] Sent: Monday, September 01, 2014 7:07 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] est2genome=1 for est and altest Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Sep 2 10:57:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Sep 2014 10:57:56 -0600 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: There is a reason why no altest2genome option exists in the maker_opts.ctl file. The est2genome and protein2genome options are meant only for generating rough partial models that can be used for training gene finders (should not be used for generating final models). And if you are thinking of using ESTs from another species (altest) to generate initial models for training it's actually an analysis error. This is because altest alignments will be far less accurate than EST or protein alignments (so they will hurt your training). They are slower to generate than EST or protein alignments (by as much as 10-20 fold because they are translated into all 6 reading frames). Also there will be far fewer of them (6 frames of translation make the alignments more spurious; thus they require higher thresholds of significance). So if you are using a species for initial training that is distant enough that it must be aligned as altest via tblastx, then you should have been using proteins instead which will be widely available and more accurately aligned. Note that both proteins and altests are aligned in amino acid space, so you can expect anywhere from several million to hundreds of millions of years of divergence, and the species you use is not expected to be closely related (so whole proteomes will be available from a number of sources that will be far more accurate than any altest alignment). The only real benefit of altest is to provide evidence of lineage specific genes for organisms where there are no species in the same branch or phylum to get protein evidence from. Since there will only be a handful of these genes and they can be obtained in any later bootstrap training steps which will not involve est2genome or protein2genome models. You should use protein2genome models instead for the initial training and only use altest for a any bootstrap training or for your final models. Thanks, Carson On 9/1/14, 7:07 AM, "Marc H?ppner" wrote: >Hi, > >I may be wrong about this, but it seems to me that Maker will never build >a gene model from EST evidence, if the set data is provided as ?altest' >rather than ?est'. In my case, I am annotating a plant for which there is >a closely related reference genome + annotation, as well as pretty good >EST data. So I supplied the EST data as ?altest', assuming that the only >difference would be that the alignment parameters would be slightly more >relaxed. But I found that Maker never made any genome models from that >data. When moving the EST data to ?est?, it worked. > >So I am not sure whether this is an intended behaviour, but in my case it >caught me a bit by surprise? > >Regards, > >Marc >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 05:38:16 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 11:38:16 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:22:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:22:08 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:25:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:25:31 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 11:39:39 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 17:39:39 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: References: Message-ID: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > From dence at genetics.utah.edu Thu Sep 4 11:44:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 4 Sep 2014 17:44:47 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: , <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi Anindyajit, It doesn't look like the error output that you sent to Barry was forwarded with your message. Can you send that again? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Barry Moore [bmoore at genetics.utah.edu] Sent: Thursday, September 04, 2014 11:39 AM To: Anindyajit Banerjee Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fgenesh output to gff3 conversion Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From MEC at stowers.org Thu Sep 4 12:10:14 2014 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 4 Sep 2014 18:10:14 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi, I'm not sure what maker offers in this regard. It's been some time since I've used it now. Anyway, if it helps, some time ago I wrote a quick fgenesh2gff using BioPerl. It is provided here. You need a bioperl installation. http://bio.perl.org/pipermail/bioperl-l/2006-July/022061.html ~Malcolm Cook >-----Original Message----- >From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Barry Moore >Sent: Thursday, September 04, 2014 12:40 PM >To: Anindyajit Banerjee >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Fgenesh output to gff3 conversion > >Hi Anindyajit, > >I'm forwarding you message along to the maker mailing list and devel team... > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the >further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby >attach a test output for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 12:45:15 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 18:45:15 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 12:52:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 12:52:20 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 13:01:57 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 19:01:57 +0000 Subject: [maker-devel] Fwd: Fgenesh output to gff3 conversion References: Message-ID: <77D4D576-9BAC-478D-8A0F-492225D71637@genetics.utah.edu> Attached is the document that Anindyajit set with his original question. B Begin forwarded message: From: Anindyajit Banerjee > Subject: Fgenesh output to gff3 conversion Date: September 4, 2014 at 8:37:26 AM MDT To: > Hi I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh test out put file for your understanding Please help -- Regards, Anindyajit Banerjee Mobile: +919883333000. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh_output_test Type: application/octet-stream Size: 199696 bytes Desc: fgenesh_output_test URL: From carsonhh at gmail.com Thu Sep 4 13:06:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:06:28 -0600 Subject: [maker-devel] Fgenesh output to gff3 conversion Message-ID: MAKER can't convert the existing output, but you could use MAKER to run FGENESH for you instead. The results of which would be in GFF3. --Carson On 9/4/14, 11:39 AM, "Barry Moore" wrote: >Hi Anindyajit, > >I?m forwarding you message along to the maker mailing list and devel team? > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee >wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I >>am trying to convert the fgenesh output to gff3 format for the further >>input in EVM. However I am encountering the error while doing so. Could >>you suggest me any possible way to do so. I hereby attach a test output >>for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 13:24:06 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 19:24:06 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 13:42:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:42:06 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 08:43:19 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 14:43:19 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Hi, I finished running MAKER as suggested above. Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g options. I called the output file maker.gff3 In the maker.gff3 I found some invalid data (does not conform .gff3 format), e.g. ### 2 + ### OR ### .Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Targ et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + ### OR some gene (or mRNA) IDs are not uniq. This means they can be found multiple times with different values within the maker.gff3 How could it happen? As I understood, mRNA IDs in a .gff3 file must be uniq. Thanks Anh-Dao On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker_opts.ctl (Carson Holt) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 18 Jul 2014 11:04:09 -0600 >From: Carson Holt >To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel > Ence >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] Maker_opts.ctl >Message-ID: >Content-Type: text/plain; charset="UTF-8" > >It should just be 'fgenesh'. If it's not there you can still just give >the GFF3. > >--Carson > > >On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>I am not sure which fgenesh executable file should I use. >> >>fgenesh= #location of fgenesh executable >> >>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you need >>to specify a list of other executable programs (such as ppd, ppdn+, etc) >> >>Anh-Dao >> >> >>On 7/16/14 3:32 PM, "Carson Holt" wrote: >> >>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>previous run. Then provide the RepeatModeler file to rmlib= >>> >>>--Carson >>> >>> >>> >>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>By default, model_org=all. Can I use the de novo repeat library >>>>predicted >>>>by RepeatModeler for the rmlib option? >>>> >>>>Anh-Dao >>>> >>>> >>>> >>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>> >>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>rmlib=. >>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>types >>>>>and use soft masking for some and hard masking for others. This >>>>>increases >>>>>sensitivity of evidence alignments while still maintaining >>>>>specificity. >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>I will run Augustus and FGENESH++ inside of MAKER using the parameter >>>>>>files for Augustus. >>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>using >>>>>>two >>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via >>>>>>de >>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>of >>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>database. >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>> >>>>>>>When you ran Augustus separately, it should have created the >>>>>>>parameters >>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>using >>>>>>>the species name you just created. >>>>>>> >>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>than >>>>>>>giving it the results as GFF3. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>Thanks Daniel for your quick response. >>>>>>>> >>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>Augustus. >>>>>>>>I created the parameter file for the genome following their >>>>>>>>instructions. >>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>be >>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>Augustus >>>>>>>>has not trained that genome (no parameter file exists). Otherwise I >>>>>>>>would >>>>>>>>run Augustus inside MAKER. >>>>>>>> >>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>(sure >>>>>>>>I >>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>.gff3 >>>>>>>>files, separated by comma? >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>> >>>>>>>>>Hi Anh-Dao, >>>>>>>>> >>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>command >>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>files >>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>> >>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>options >>>>>>>>>in >>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>the >>>>>>>>>?pred_gff? option. >>>>>>>>> >>>>>>>>>If you have the RepeatMasker output in gff3 format you can give it >>>>>>>>>to >>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>> >>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>to >>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>gives >>>>>>>>>fasta >>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>the >>>>>>>>>other est fasta files. >>>>>>>>> >>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>can >>>>>>>>>use >>>>>>>>>another organism that augustus is already trained for. The list of >>>>>>>>>species that augustus has parameter files for is in the README.txt >>>>>>>>>that >>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>inside >>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>ext-based >>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>predictive ability. >>>>>>>>> >>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>another >>>>>>>>>organism?s parameter file? >>>>>>>>> >>>>>>>>>Thanks, >>>>>>>>>Daniel >>>>>>>>> >>>>>>>>> >>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>following >>>>>>>>>>data: >>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>options) >>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>> - proteins (fasta) >>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert >>>>>>>>>>to >>>>>>>>>>ZFF >>>>>>>>>>format, etc. ) >>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>convert >>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>for >>>>>>>>>>Augustus. >>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>> >>>>>>>>>> Could you please let me know how can I specify parameters in the >>>>>>>>>>maker_opts.ctl file? >>>>>>>>>> Or do you have other suggestions to re-do the data listed above? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Anh-Dao >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>maker-devel mailing list >>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab >>>>>>>>. >>>>>>>>o >>>>>>>>r >>>>>>>>g >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 74, Issue 17 >******************************************* From carsonhh at gmail.com Fri Sep 5 09:37:02 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:37:02 -0600 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 Message-ID: The partial lines are symptoms of writing data to a slow NFS mounted drive. If NFS can't get a response for a write operation, it returns success (even though it wasn't really successful) and then continues to wait for the operation to really complete. This is called asynchronous writing. It improves performance by optimistically returning success on all operations rather than waiting to see if the operation really succeeded. If you have a slow or overloaded NFS mount though, you can get a number a failures and never any indication that they failed except for the fact that some files are missing content or lines are partial. When this happens, you need to run MAKER with the -a flag on fewer CPUs to rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can find which contigs have partial GFF3 lines, you can delete just those along with the datastore index log file and then launch maker without any flags to let it recompute just those contigs. Another possible cause is also NFS related. If you are running MAKER multiple times in the same working directory, and a slow NFS mount doesn't allow maker to properly lock files, then two maker jobs can try and compute the same contig simultaneously. Simultaneous writing of files can then cause IDs to be duplicated and some lines to be munged as lines from one process arrive to the file in the middle of lines from another process (creating a jumble of characters and partial lines). Start a singe maker job on fewer cpus using the -a flag to rebuild the GFF3 files if this is the case. Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are passing in GFF3 files with already assigned IDS (that may be used elsewhere). Are you using GFF3 pass-trough? Features that will not have unique ID= tags are CDS, three_prime_utr, and five_prime_utr features (these are considered non-continuous features because of the shared ID across lines). You can see examples here --> http://www.sequenceontology.org/gff3.shtml Also Name= attributes are not required to be unique. Thanks, Carson On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >Hi, > >I finished running MAKER as suggested above. >Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >options. I called the output file maker.gff3 > >In the maker.gff3 I found some invalid data (does not conform .gff3 >format), e.g. > >### >2 + >### > >OR > >### >.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Tar >g >et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >### > >OR some gene (or mRNA) IDs are not uniq. This means they can be found >multiple times with different values within the maker.gff3 > >How could it happen? As I understood, mRNA IDs in a .gff3 file must be >uniq. > >Thanks >Anh-Dao > > > > > >On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" > wrote: > >>Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >>To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >>or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >>You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >>When replying, please edit your Subject line so it is more specific >>than "Re: Contents of maker-devel digest..." >> >> >>Today's Topics: >> >> 1. Re: Maker_opts.ctl (Carson Holt) >> >> >>---------------------------------------------------------------------- >> >>Message: 1 >>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>From: Carson Holt >>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >> Ence >>Cc: "maker-devel at yandell-lab.org" >>Subject: Re: [maker-devel] Maker_opts.ctl >>Message-ID: >>Content-Type: text/plain; charset="UTF-8" >> >>It should just be 'fgenesh'. If it's not there you can still just give >>the GFF3. >> >>--Carson >> >> >>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>I am not sure which fgenesh executable file should I use. >>> >>>fgenesh= #location of fgenesh executable >>> >>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>need >>>to specify a list of other executable programs (such as ppd, ppdn+, etc) >>> >>>Anh-Dao >>> >>> >>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>> >>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>previous run. Then provide the RepeatModeler file to rmlib= >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>predicted >>>>>by RepeatModeler for the rmlib option? >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>> >>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>> >>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>rmlib=. >>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>types >>>>>>and use soft masking for some and hard masking for others. This >>>>>>increases >>>>>>sensitivity of evidence alignments while still maintaining >>>>>>specificity. >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>parameter >>>>>>>files for Augustus. >>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>using >>>>>>>two >>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>via >>>>>>>de >>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>of >>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>database. >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>parameters >>>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>>using >>>>>>>>the species name you just created. >>>>>>>> >>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>than >>>>>>>>giving it the results as GFF3. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>> >>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>Augustus. >>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>instructions. >>>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>>be >>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>Augustus >>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>I >>>>>>>>>would >>>>>>>>>run Augustus inside MAKER. >>>>>>>>> >>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>(sure >>>>>>>>>I >>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>.gff3 >>>>>>>>>files, separated by comma? >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>>> >>>>>>>>>>Hi Anh-Dao, >>>>>>>>>> >>>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>command >>>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>>files >>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>> >>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>options >>>>>>>>>>in >>>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>>the >>>>>>>>>>?pred_gff? option. >>>>>>>>>> >>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>it >>>>>>>>>>to >>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>> >>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>>to >>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>gives >>>>>>>>>>fasta >>>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>>the >>>>>>>>>>other est fasta files. >>>>>>>>>> >>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>can >>>>>>>>>>use >>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>of >>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>README.txt >>>>>>>>>>that >>>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>>inside >>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>ext-based >>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>predictive ability. >>>>>>>>>> >>>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>>another >>>>>>>>>>organism?s parameter file? >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>following >>>>>>>>>>>data: >>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>options) >>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>> - proteins (fasta) >>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>convert >>>>>>>>>>>to >>>>>>>>>>>ZFF >>>>>>>>>>>format, etc. ) >>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>convert >>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>>for >>>>>>>>>>>Augustus. >>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>> >>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>the >>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>above? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Anh-Dao >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>maker-devel mailing list >>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b >>>>>>>>>. >>>>>>>>>o >>>>>>>>>r >>>>>>>>>g >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> >> >>------------------------------ >> >>Subject: Digest Footer >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >>------------------------------ >> >>End of maker-devel Digest, Vol 74, Issue 17 >>******************************************* > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Fri Sep 5 01:58:59 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Fri, 5 Sep 2014 07:58:59 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 5 09:17:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:17:45 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I'm glad the work around is working for you. Proc::ProcessTable being faster than 'ps' is actually very very atypical. It is likely there is an issue with your system which is suggested by the fact 'ps' is hanging and accumulating processes which is also very atypical (ps should return in a fraction of a second). We actually switched from Proc::ProcessTable to 'ps' some time ago because 'ps' is several fold faster, and Proc::ProcessTable won't compile on about 10-15% of system architectures. Thanks, Carson From: "Timothy Stitt (TGAC)" Date: Friday, September 5, 2014 at 1:58 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 10:08:50 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 16:08:50 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Thanks Carson. I ran MAKER on 30 CPUs. I will re-run it using 10 CPUs. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > I submitted est_gff=cufflinks.gff3 and pred_gff=fgenesh.gff3 when running MAKER. However, I got 4 repeated mRNA ids as follows: augustus_masked-c206700011.Contig3-processed-gene-0.3 augustus_masked-c206700011.Contig3-processed-gene-0.3-mRNA-1 snap_masked-c206500027.Contig3-processed-gene-0.26 snap_masked-c206500027.Contig3-processed-gene-0.26-mRNA-1 Anh-Dao From Brian.Mack at ARS.USDA.GOV Mon Sep 8 07:47:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 8 Sep 2014 13:47:01 +0000 Subject: [maker-devel] non-overlapping predictions Message-ID: Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set-td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Sep 9 11:14:09 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 9 Sep 2014 17:14:09 +0000 Subject: [maker-devel] Non-canonical splice junctions Message-ID: <1410282848765.20893@uga.edu> Hi, Is it possible to force MAKER to predict gene models only with canonical splice sites? Or is a quick way to identify gene models with non-canonical splice sites? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 9 16:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 09 Sep 2014 16:09:13 -0600 Subject: [maker-devel] non-overlapping predictions Message-ID: It's a naming issue. The reference match/match_part features have 'abinit' in the name while the non-overlapping fasta file has 'processed' in the name of the fasta header. The easiest way to fix it is to just replaced 'processed' with 'abinit' in the terms you are searching for. This was supposed to be resolved already, but I'll see what's going on. What version of MAKER are you using? --Carson From: "Mack, Brian" Date: Monday, September 8, 2014 at 7:47 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] non-overlapping predictions Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set -td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Thu Sep 18 05:49:45 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Thu, 18 Sep 2014 11:49:45 +0000 Subject: [maker-devel] CPUs problems Message-ID: I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I checked the log file and got these errors: Processing run.log file... examining contents of the fasta file and run log shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory Can you let me know how can I fix the problem? Thanks Anh-Dao On 9/5/14 11:37 AM, "Carson Holt" wrote: >The partial lines are symptoms of writing data to a slow NFS mounted >drive. If NFS can't get a response for a write operation, it returns >success (even though it wasn't really successful) and then continues to >wait for the operation to really complete. This is called asynchronous >writing. It improves performance by optimistically returning success on >all operations rather than waiting to see if the operation really >succeeded. If you have a slow or overloaded NFS mount though, you can get >a number a failures and never any indication that they failed except for >the fact that some files are missing content or lines are partial. > >When this happens, you need to run MAKER with the -a flag on fewer CPUs to >rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >find which contigs have partial GFF3 lines, you can delete just those >along with the datastore index log file and then launch maker without any >flags to let it recompute just those contigs. > >Another possible cause is also NFS related. If you are running MAKER >multiple times in the same working directory, and a slow NFS mount doesn't >allow maker to properly lock files, then two maker jobs can try and >compute the same contig simultaneously. Simultaneous writing of files can >then cause IDs to be duplicated and some lines to be munged as lines from >one process arrive to the file in the middle of lines from another process >(creating a jumble of characters and partial lines). Start a singe maker >job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >the case. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > >Features that will not have unique ID= tags are CDS, three_prime_utr, and >five_prime_utr features (these are considered non-continuous features >because of the shared ID across lines). >You can see examples here --> http://www.sequenceontology.org/gff3.shtml > >Also Name= attributes are not required to be unique. > >Thanks, >Carson > > > > > > >On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>Hi, >> >>I finished running MAKER as suggested above. >>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>options. I called the output file maker.gff3 >> >>In the maker.gff3 I found some invalid data (does not conform .gff3 >>format), e.g. >> >>### >>2 + >>### >> >>OR >> >>### >>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Ta >>r >>g >>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>### >> >>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>multiple times with different values within the maker.gff3 >> >>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>uniq. >> >>Thanks >>Anh-Dao >> >> >> >> >> >>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >> wrote: >> >>>Send maker-devel mailing list submissions to >>> maker-devel at yandell-lab.org >>> >>>To subscribe or unsubscribe via the World Wide Web, visit >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>>or, via email, send a message with subject or body 'help' to >>> maker-devel-request at yandell-lab.org >>> >>>You can reach the person managing the list at >>> maker-devel-owner at yandell-lab.org >>> >>>When replying, please edit your Subject line so it is more specific >>>than "Re: Contents of maker-devel digest..." >>> >>> >>>Today's Topics: >>> >>> 1. Re: Maker_opts.ctl (Carson Holt) >>> >>> >>>---------------------------------------------------------------------- >>> >>>Message: 1 >>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>From: Carson Holt >>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>> Ence >>>Cc: "maker-devel at yandell-lab.org" >>>Subject: Re: [maker-devel] Maker_opts.ctl >>>Message-ID: >>>Content-Type: text/plain; charset="UTF-8" >>> >>>It should just be 'fgenesh'. If it's not there you can still just give >>>the GFF3. >>> >>>--Carson >>> >>> >>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>I am not sure which fgenesh executable file should I use. >>>> >>>>fgenesh= #location of fgenesh executable >>>> >>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>need >>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>etc) >>>> >>>>Anh-Dao >>>> >>>> >>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>> >>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>predicted >>>>>>by RepeatModeler for the rmlib option? >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>> >>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>rmlib=. >>>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>>types >>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>increases >>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>specificity. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>parameter >>>>>>>>files for Augustus. >>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>using >>>>>>>>two >>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>via >>>>>>>>de >>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>>of >>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>database. >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>> >>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>parameters >>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>MAKER >>>>>>>>>using >>>>>>>>>the species name you just created. >>>>>>>>> >>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>than >>>>>>>>>giving it the results as GFF3. >>>>>>>>> >>>>>>>>>--Carson >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>> >>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>Augustus. >>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>instructions. >>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>gene >>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>will >>>>>>>>>>be >>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>Augustus >>>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>>I >>>>>>>>>>would >>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>> >>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>>(sure >>>>>>>>>>I >>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>.gff3 >>>>>>>>>>files, separated by comma? >>>>>>>>>> >>>>>>>>>>Anh-Dao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>> >>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>protein >>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>command >>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>protein >>>>>>>>>>>files >>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>> >>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>options >>>>>>>>>>>in >>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>in >>>>>>>>>>>the >>>>>>>>>>>?pred_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>gives >>>>>>>>>>>fasta >>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>all >>>>>>>>>>>the >>>>>>>>>>>other est fasta files. >>>>>>>>>>> >>>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>>can >>>>>>>>>>>use >>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>of >>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>README.txt >>>>>>>>>>>that >>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>from >>>>>>>>>>>inside >>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>ext-based >>>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>>predictive ability. >>>>>>>>>>> >>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>use >>>>>>>>>>>another >>>>>>>>>>>organism?s parameter file? >>>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>following >>>>>>>>>>>>data: >>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>options) >>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>convert >>>>>>>>>>>>to >>>>>>>>>>>>ZFF >>>>>>>>>>>>format, etc. ) >>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>convert >>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>trained >>>>>>>>>>>>for >>>>>>>>>>>>Augustus. >>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>> >>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>the >>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>above? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Anh-Dao >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> >>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell >>>>>>>>>>>>- >>>>>>>>>>>>l >>>>>>>>>>>>a >>>>>>>>>>>>b >>>>>>>>>>>>. >>>>>>>>>>>>o >>>>>>>>>>>>r >>>>>>>>>>>>g >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>maker-devel mailing list >>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>>------------------------------ >>> >>>Subject: Digest Footer >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>>------------------------------ >>> >>>End of maker-devel Digest, Vol 74, Issue 17 >>>******************************************* >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri Sep 19 11:22:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Sep 2014 11:22:50 -0600 Subject: [maker-devel] CPUs problems In-Reply-To: References: Message-ID: These are further symptoms of an IO related issue. The script cannot even query it's current working directory. Check to make sure there is plenty of space in the temporary directory /tmp. If /tmp is separately mounted on each machine there may be one that is full. Also make sure you did not set TMP= in the maker_opts.ctl file to an NFS mounted location. Do you by any chance get any warnings when you start MAKER. For example --> WARNING: Multiple MAKER processes have been started in the same directory. That would indicate that the MPI communication rung is down which would drastically increase IO operations. You may also have one or more nodes that are having the issue and are the source of all the errors. If you are using OpenMPI to run MAKER, you can tag the output from each node using the --tag-output flag for mpiexec. Then if the same node is always producing the error, you can have IT look at it. Also MAKER is set to automatically retry on errors. If all contigs are finished, check the output. Make sure there are the same number of genes in the fasta files and GFF3 files. Also look for munged content. If everything looks ok, MAKER may have gotten around the issue through shear brute force (I.e. it retried until it succeeded). --Carson On 9/18/14, 5:49 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I >checked the log file and got these errors: > >Processing run.log file... >examining contents of the fasta file and run log >shell-init: error retrieving current directory: getcwd: cannot access >parent directories: No such file or directory > > >Can you let me know how can I fix the problem? > >Thanks >Anh-Dao > > >On 9/5/14 11:37 AM, "Carson Holt" wrote: > >>The partial lines are symptoms of writing data to a slow NFS mounted >>drive. If NFS can't get a response for a write operation, it returns >>success (even though it wasn't really successful) and then continues to >>wait for the operation to really complete. This is called asynchronous >>writing. It improves performance by optimistically returning success on >>all operations rather than waiting to see if the operation really >>succeeded. If you have a slow or overloaded NFS mount though, you can get >>a number a failures and never any indication that they failed except for >>the fact that some files are missing content or lines are partial. >> >>When this happens, you need to run MAKER with the -a flag on fewer CPUs >>to >>rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >>find which contigs have partial GFF3 lines, you can delete just those >>along with the datastore index log file and then launch maker without any >>flags to let it recompute just those contigs. >> >>Another possible cause is also NFS related. If you are running MAKER >>multiple times in the same working directory, and a slow NFS mount >>doesn't >>allow maker to properly lock files, then two maker jobs can try and >>compute the same contig simultaneously. Simultaneous writing of files >>can >>then cause IDs to be duplicated and some lines to be munged as lines from >>one process arrive to the file in the middle of lines from another >>process >>(creating a jumble of characters and partial lines). Start a singe maker >>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >>the case. >> >>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you >>are >>passing in GFF3 files with already assigned IDS (that may be used >>elsewhere). Are you using GFF3 pass-trough? >> >>Features that will not have unique ID= tags are CDS, three_prime_utr, and >>five_prime_utr features (these are considered non-continuous features >>because of the shared ID across lines). >>You can see examples here --> http://www.sequenceontology.org/gff3.shtml >> >>Also Name= attributes are not required to be unique. >> >>Thanks, >>Carson >> >> >> >> >> >> >>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>Hi, >>> >>>I finished running MAKER as suggested above. >>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>>options. I called the output file maker.gff3 >>> >>>In the maker.gff3 I found some invalid data (does not conform .gff3 >>>format), e.g. >>> >>>### >>>2 + >>>### >>> >>>OR >>> >>>### >>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;T >>>a >>>r >>>g >>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>>### >>> >>>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>>multiple times with different values within the maker.gff3 >>> >>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>>uniq. >>> >>>Thanks >>>Anh-Dao >>> >>> >>> >>> >>> >>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >>> wrote: >>> >>>>Send maker-devel mailing list submissions to >>>> maker-devel at yandell-lab.org >>>> >>>>To subscribe or unsubscribe via the World Wide Web, visit >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>g >>>> >>>>or, via email, send a message with subject or body 'help' to >>>> maker-devel-request at yandell-lab.org >>>> >>>>You can reach the person managing the list at >>>> maker-devel-owner at yandell-lab.org >>>> >>>>When replying, please edit your Subject line so it is more specific >>>>than "Re: Contents of maker-devel digest..." >>>> >>>> >>>>Today's Topics: >>>> >>>> 1. Re: Maker_opts.ctl (Carson Holt) >>>> >>>> >>>>---------------------------------------------------------------------- >>>> >>>>Message: 1 >>>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>>From: Carson Holt >>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>>> Ence >>>>Cc: "maker-devel at yandell-lab.org" >>>>Subject: Re: [maker-devel] Maker_opts.ctl >>>>Message-ID: >>>>Content-Type: text/plain; charset="UTF-8" >>>> >>>>It should just be 'fgenesh'. If it's not there you can still just give >>>>the GFF3. >>>> >>>>--Carson >>>> >>>> >>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>I am not sure which fgenesh executable file should I use. >>>>> >>>>>fgenesh= #location of fgenesh executable >>>>> >>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>>need >>>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>>etc) >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>>> >>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like >>>>>>your >>>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>>predicted >>>>>>>by RepeatModeler for the rmlib option? >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>>rmlib=. >>>>>>>> By letting MAKER handle repeat masking it will differentiate >>>>>>>>repeat >>>>>>>>types >>>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>>increases >>>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>>specificity. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>>parameter >>>>>>>>>files for Augustus. >>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>>using >>>>>>>>>two >>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>>via >>>>>>>>>de >>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM >>>>>>>>>inside >>>>>>>>>of >>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>>database. >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>>> >>>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>>parameters >>>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>>MAKER >>>>>>>>>>using >>>>>>>>>>the species name you just created. >>>>>>>>>> >>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>>than >>>>>>>>>>giving it the results as GFF3. >>>>>>>>>> >>>>>>>>>>--Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>>> >>>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>>Augustus. >>>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>>instructions. >>>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>>gene >>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>>will >>>>>>>>>>>be >>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>>Augustus >>>>>>>>>>>has not trained that genome (no parameter file exists). >>>>>>>>>>>Otherwise >>>>>>>>>>>I >>>>>>>>>>>would >>>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>>> >>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker >>>>>>>>>>>output >>>>>>>>>>>(sure >>>>>>>>>>>I >>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>>.gff3 >>>>>>>>>>>files, separated by comma? >>>>>>>>>>> >>>>>>>>>>>Anh-Dao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>>> >>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>>protein >>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>>command >>>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>>protein >>>>>>>>>>>>files >>>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>>> >>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>>options >>>>>>>>>>>>in >>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>>in >>>>>>>>>>>>the >>>>>>>>>>>>?pred_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>>gives >>>>>>>>>>>>fasta >>>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>>all >>>>>>>>>>>>the >>>>>>>>>>>>other est fasta files. >>>>>>>>>>>> >>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then >>>>>>>>>>>>you >>>>>>>>>>>>can >>>>>>>>>>>>use >>>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>>of >>>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>>README.txt >>>>>>>>>>>>that >>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>>from >>>>>>>>>>>>inside >>>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>>ext-based >>>>>>>>>>>>hints to augustus at runtime, which can really improve >>>>>>>>>>>>Augustus? >>>>>>>>>>>>predictive ability. >>>>>>>>>>>> >>>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>>use >>>>>>>>>>>>another >>>>>>>>>>>>organism?s parameter file? >>>>>>>>>>>> >>>>>>>>>>>>Thanks, >>>>>>>>>>>>Daniel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>>following >>>>>>>>>>>>>data: >>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>>options) >>>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>>convert >>>>>>>>>>>>>to >>>>>>>>>>>>>ZFF >>>>>>>>>>>>>format, etc. ) >>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>>convert >>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>>trained >>>>>>>>>>>>>for >>>>>>>>>>>>>Augustus. >>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>>the >>>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>>above? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Anh-Dao >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> >>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandel >>>>>>>>>>>>>l >>>>>>>>>>>>>- >>>>>>>>>>>>>l >>>>>>>>>>>>>a >>>>>>>>>>>>>b >>>>>>>>>>>>>. >>>>>>>>>>>>>o >>>>>>>>>>>>>r >>>>>>>>>>>>>g >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>maker-devel mailing list >>>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>>------------------------------ >>>> >>>>Subject: Digest Footer >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>>------------------------------ >>>> >>>>End of maker-devel Digest, Vol 74, Issue 17 >>>>******************************************* >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carson.holt at genetics.utah.edu Mon Sep 22 14:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 22 Sep 2014 20:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <541BCE0A.70806@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> Message-ID: The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > From myandell at genetics.utah.edu Mon Sep 22 18:10:38 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 23 Sep 2014 00:10:38 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: References: <541BCE0A.70806@env.ethz.ch>, Message-ID: <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> Also are you numbers including the ab-inito predictions without evidence that have pfamm domains? cheers, --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carson.holt at genetics.utah.edu] Sent: Monday, September 22, 2014 2:17 PM To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From aksrao at ucdavis.edu Thu Sep 25 10:18:30 2014 From: aksrao at ucdavis.edu (Anand K S Rao) Date: Thu, 25 Sep 2014 09:18:30 -0700 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Message-ID: Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Sep 25 12:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 25 Sep 2014 18:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <5421695F.5040409@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> <5421695F.5040409@env.ethz.ch> Message-ID: Sorry for the slow reply. I was trying to locate a script that might be useful for you. I think a species specific repeat libary will be of most benefit here (it's surprising just how influential this step is). Also note that you should train SNAP and Augustus on your species and are not just using another related species as a stand in. With respect to PFAM domains, on some organisms you may not get a lot of cross species protein alignments because of divergence or assembly issues. This of course makes it harder to support these models with direct protein alignments. However you can run InterProscan over the non-overlapping.proteins.fasta file produced by MAKER (contains non-redundant rejected models). Because an HMM is used for domain identification, it can pick up protein domains that would not produce a significant BLAST alignment because of divergence. You can then add models with positive hits for protein domains back into your gene set. This ad hoc procedure usually can only increase gene counts by about 10% though for organisms where it's required. I've attached a script that makes generating results for these genes easier. 1. First you run InterProScan with just PFAM. 2. Then you take the IDs of all models that have a domain in the report and create a list (1 ID per line). 3. Next use the fasta_tool script that comes with MAKER together with the --select flag to separate just the positive hits (ID's in your list) from the non-overlapping.proteins.fasta and non-overlapping.transscripts.fasta files. 4. Use the attached script to separate just the positive hits (your ID list) from the GFF3. The script will upgrade match/match_part results to gene/mRNA/exon/CDS results and print them out for you. 5. Use the fasta_maerge and gff3_merge scripts that come with MAKER to merge the selected/upgraded GFF3 entries and selected FASTA entries back into the original MAKER results. --Carson On 9/23/14, 6:36 AM, "Stefan Zoller" wrote: >Please forgive my ignorance, I am not entirely sure if I understand your >question correctly, but I will try to answer. >As evidence we use: >1) our own transcriptome (trinity assembled RNAseq, filtering out the >very low expression transcripts). >2) all swissprot plant proteins, and several protein sets from closely >related plant species downloaded from NCBI. >I am not sure if the ab-initio predictions without evidence have pfamm >domains. Honestly, I would not know how to tell and how to >include/exclude. >I was assuming that we should not have too many Maker approved >predictions without evidence anyway, because we use "keeps_preds=0". >The numbers of gene predictions I mentioned in my email are the >predictions reported by the fasta_merge/gff3_merge scripts in the >"*maker.proteins.fasta". There are of course many more predictions in >e.g., "*maker.augustus_masked.proteins.fasta" (about 68'000 in this file). > >I hope I am not totally off with my answer. >Cheers, Stefan > > > >On 23.09.14 02:10, Mark Yandell wrote: >> Also are you numbers including the ab-inito predictions without >>evidence that have pfamm domains? >> >> cheers, >> >> >> --mark >> >> >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Co-director USTAR Center for Genetic Discovery >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Carson Holt [carson.holt at genetics.utah.edu] >> Sent: Monday, September 22, 2014 2:17 PM >> To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. >>scaffolded genome >> >> The contiged assembly is more likely to give spurious hits and >>alignments. >> They also can be harder to repeat mask. Also gene predictors can >>behave >> slightly different on small sequences than on longer ones. If you have >> fewer gene models than you expect, your first step should be to process >> the scaffolds with CEGMA. It will give you an estimate of the genomes >> "completeness". If CEGMA gives a 60% completeness value for example >>then >> you can expect to only recover 60% of the expected number of genes. Next >> you should run RepeatModeler of similar software to help generate a >> species specific repeat library. Under masked repeats can make >>predicting >> genes on longer scaffolds far more difficult for ab initio predictors. >> >> --Carson >> >> >> On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >> >>> Hi, >>> >>> I am working on the annotation of a plant genome (about 600MB) and we >>> have a reasonable draft assembly, a fairly good transcriptome and quite >>> a few proteins from related species. We have also extensively trained >>> augustus and are also feeding genmark and snap predictions. >>> >>> Recently I noticed a behavior of Maker that seems fairly odd and which >>>I >>> cannot explain at all. When I take the scaffolded genome (about 23000 >>> scaffolds) I get roughly 9'000 maker approved gene models. Which is >>> admittedly a bit on the low side and we have to work on this. However, >>> when I break up the scaffolds into contigs at stretches of N longer >>> 500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >>> obviously 17'000 is more in the range what I would expect, so I am >>> inclined to go with these. I have looked at both annotations and the >>> evidence in WebApollo and the evidence alignments are identical for >>>both >>> runs. The approved genes seem to be the same, except for the additional >>> ones in the "contiged" genome version. The additional gene models are >>> not necessarily at the ends of the contigs, so I think it has nothing >>>to >>> do with having the stretches of Ns nearby in the scaffolded genome. Do >>> you have any idea why maker comes up with the additional numbers of >>>gene >>> models and how I could "convince" maker to give me the same gene models >>> for the scaffolded assembly? >>> >>> Cheers, >>> Stefan >>> >>> >>> >>> -- >>> Stefan Zoller, PhD >>> Bioinformatics >>> Genetic Diversity Centre >>> ETH Zurich CHN E55.1 >>> Universit?tsstrasse 16 >>> 8092 Zurich >>> Switzerland >>> >>> Phone: +41 44 632 66 85 >>> E-Mail: stefan.zoller at env.ethz.ch >>> Web: www.gdc.ethz.ch >>> >>> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 5523 bytes Desc: gff3_preds2models URL: From carsonhh at gmail.com Thu Sep 25 12:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 25 Sep 2014 12:43:35 -0600 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? In-Reply-To: References: Message-ID: When you say "gene-like structures:, are you saying that you are looking for pseudogenes and non-coding genes? You can use the trnascan and snoscan options in the maker_opts.ctl file to find some non-coding RNAS. You may just want to leave off all ab initio gene predictors like SNAP and Augustus as those will be looking for canonical coding genes. If you first hard mask any coding genes, and then provide ESTs or assembled mRNA-seq and proteins, you may be able to use the exonerate alignments produced to identify potential gene like structures. It might require a little post processing of the resulting GFF3 by you. Thanks, Carson From: Anand K S Rao Date: Thursday, September 25, 2014 at 10:18 AM To: Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Sep 29 08:47:00 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 29 Sep 2014 14:47:00 +0000 Subject: [maker-devel] maker failure with example data In-Reply-To: References: Message-ID: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Mon Sep 29 06:33:50 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Mon, 29 Sep 2014 18:03:50 +0530 Subject: [maker-devel] maker failure with example data Message-ID: Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 13:33:18 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 19:33:18 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From eschang1 at gmail.com Tue Sep 30 14:02:30 2014 From: eschang1 at gmail.com (Sally Chang) Date: Tue, 30 Sep 2014 15:02:30 -0500 Subject: [maker-devel] interpreting SNAP gene-stats output Message-ID: Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 14:49:10 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 20:49:10 +0000 Subject: [maker-devel] Maker In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> References: <000001cfdc80$77dc88e0$67959aa0$@uni-bayreuth.de> <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> Message-ID: MAKER can't annotate assembled transcripts. It can only annotate genomic sequence. Transcript annotation is a very different problem. Using a different species' genome would not produce annotation for your transcripts, rather your transcripts would just be considered evidence for annotating that species genome. Your best option is probably just to use BLAST to look for homology between species. Do BLAST both ways and if gene A in species 1 is the best hit for gene B in species 2 and vice versa (reciprocal best hits), then you can consider them as being paralogous. Also use the proteome from the related species when doing the BLAST analysis (not the nucleotide transcripts). --Carson On 9/30/14, 6:51 AM, "Mark Yandell" wrote: > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Co-director USTAR Center for Genetic Discovery >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Alfons Weig [a.weig at uni-bayreuth.de] >Sent: Tuesday, September 30, 2014 1:30 AM >To: Mark Yandell >Subject: Maker > >Hello, > >I have just sent a feedback via the Maker feedback form but received the >following error message: Therefore, I send it vir regular mail: > >Error executing run mode 'feedback': Can't call method "MailMsg" without >a package or object reference at /var/www/cgi-bin/mwas/lib/MWS.pm line >1116. >at /var/www/cgi-bin/mwas/maker.cgi line 21. > >I have just tested the Maker annotation pipeline with short sequences >from an RNAseq de-novo assembly using A. mellifera as areference genome. >Unfortunately, honey bee is not the species we sequence but is closely >related to it. >I was wondering whether this was a good approach? There are no genome >data availabe for our bee species. Is maker able to annotate de.novo >assemble mRNA transcripts obtained by Velvet/Oases (including partial >sequences)? > >Best regards >Alfons Weig > > >Dr. Alfons Weig >DNA-Analytik & ?koinformatik - Univ. Bayreuth - NW1 >Universit?tsstrasse 30 >95447 Bayreuth - Germany >Tel. +49 (0)921-552457 >www.daneco.uni-bayreuth.de > From carsonhh at gmail.com Tue Sep 30 14:59:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 30 Sep 2014 14:59:47 -0600 Subject: [maker-devel] interpreting SNAP gene-stats output In-Reply-To: References: Message-ID: Probably. But it's really not that important of a value because during the 'fathom -genome.ann genome.dna -categorize 1000' step outlined in the SNAP training literature, fathom turns each gene into it's own little contig padded by 1000bp on either size. So in the end the number of starting contigs becomes irrelevant, because they all get trimmed and thrown away anyways. --Carson From: Sally Chang Date: Tuesday, September 30, 2014 at 2:02 PM To: Subject: [maker-devel] interpreting SNAP gene-stats output Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Tue Sep 30 23:39:21 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 1 Oct 2014 05:39:21 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Another possibility could be that MPICH2 wasn?t build properly, no? I remember something with enabling shared libraries during the compilation of mpich, without which the error below would appear. /Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se On 30 Sep 2014, at 21:33, Carson Holt > wrote: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Tue Sep 30 23:58:16 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Wed, 1 Oct 2014 11:28:16 +0530 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Dear All, Thank you. I figured out th problem is with mpich2. I was behind mpich2 but was unsuccessful. I installed mpich v3 and its working fine now. Thank you all. The old GMDO tutorials are bit misleading as the new versions have come up. On Wed, Oct 1, 2014 at 11:09 AM, Marc H?ppner wrote: > Another possibility could be that MPICH2 wasn?t build properly, no? I > remember something with enabling shared libraries during the compilation of > mpich, without which the error below would appear. > > /Marc > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > On 30 Sep 2014, at 21:33, Carson Holt > wrote: > > The message is warning that there are multiple instances of MAKER > running, but no MPI communication. When you build MAKER (perl Build.PL step > when installing MAKER), you need to specify the location of 'mpicc' and > 'mpi.h' to build with MPI support. Otherwise you won't be able to link > against MPICH2 shared libraries. You probably need to rerun that step. > > --Carson > > > From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data > > Hi Carson, > > I figured out the problem is with RepeatMasker installation and I fixed > it. > > I am running maker with MPICH2 and I get the following warning when I > start it: > > > > *STATUS: Processing and indexing input FASTA files... WARNING: Multiple > MAKER processes have been started in the same directory.* > > I would like to if this is common. > > Regards, > Goutham > > > On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: > >> Dear Carson, >> >> Thank you for the reply. I reinstalled the BioPerl and now I am getting >> the following error on test data. >> >> ERROR: RepeatMasker failed >> --> rank=NA, hostname=motif >> ERROR: Failed while doing repeat masking >> ERROR: Chunk failed at level:0, tier_type:1 >> FAILED CONTIG:contig-dpp-500-500 >> >> On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt < >> carson.holt at genetics.utah.edu> wrote: >> >>> The error is caused by the BioPerl indexer returning an empty length >>> for the indexed fasta sequence (possibly because of a corrupt index file or >>> other reasons). You may need to reinstall BioPerl (use the CPAN version >>> not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl >>> indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface >>> to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb >>> directory for the MAKER run before retrying. >>> >>> Also verify that the /tmp directory on your system or the directory >>> pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is >>> not set to an NFS mounted location. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> >>> From: Goutham atla >>> Date: Monday, September 29, 2014 at 6:33 AM >>> To: >>> Subject: maker failure with example data >>> >>> Dear All, >>> >>> I am running maker with the demo file, i.e dip_contig.fasta by keeping >>> all other parameters in .ctl files as default. But it do not progress and >>> shows the following message that the length of the sequence is 0. Can >>> anybody help me ? >>> >>> >>> >>> --Next Contig-- >>> >>> MAKER WARNING: All old files will be erased before continuing >>> #--------------------------------------------------------------------- >>> Skipping the contig because it is too short!! >>> SeqID: contig-dpp-500-500 >>> Length: 0 >>> #--------------------------------------------------------------------- >>> >>> >>> Regards, >>> Goutham >>> >> >> >> >> -- >> Goutham Atla >> > > > > -- > Goutham Atla > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From mphoeppner at gmail.com Mon Sep 1 07:07:40 2014 From: mphoeppner at gmail.com (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Mon, 1 Sep 2014 15:07:40 +0200 Subject: [maker-devel] est2genome=1 for est and altest Message-ID: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc From dence at genetics.utah.edu Tue Sep 2 09:32:03 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 2 Sep 2014 15:32:03 +0000 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: Hi Marc, This is a partial answer to your question. I don't know the full reason that models aren't built from altest evidence, but I do know that those sequences are aligned with tblastx (nucleotide translated to protein and back to nucleotide) and not with blastn with relaxed parameters. Also the final protein and nucleotide alignments that do get made into models are made by exonerate and not by blast. Does that help? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Marc H?ppner [mphoeppner at gmail.com] Sent: Monday, September 01, 2014 7:07 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] est2genome=1 for est and altest Hi, I may be wrong about this, but it seems to me that Maker will never build a gene model from EST evidence, if the set data is provided as ?altest' rather than ?est'. In my case, I am annotating a plant for which there is a closely related reference genome + annotation, as well as pretty good EST data. So I supplied the EST data as ?altest', assuming that the only difference would be that the alignment parameters would be slightly more relaxed. But I found that Maker never made any genome models from that data. When moving the EST data to ?est?, it worked. So I am not sure whether this is an intended behaviour, but in my case it caught me a bit by surprise? Regards, Marc _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue Sep 2 10:57:56 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Sep 2014 10:57:56 -0600 Subject: [maker-devel] est2genome=1 for est and altest In-Reply-To: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> References: <21FB59E4-45D3-4667-9B1A-7EB5BA1E98CC@gmail.com> Message-ID: There is a reason why no altest2genome option exists in the maker_opts.ctl file. The est2genome and protein2genome options are meant only for generating rough partial models that can be used for training gene finders (should not be used for generating final models). And if you are thinking of using ESTs from another species (altest) to generate initial models for training it's actually an analysis error. This is because altest alignments will be far less accurate than EST or protein alignments (so they will hurt your training). They are slower to generate than EST or protein alignments (by as much as 10-20 fold because they are translated into all 6 reading frames). Also there will be far fewer of them (6 frames of translation make the alignments more spurious; thus they require higher thresholds of significance). So if you are using a species for initial training that is distant enough that it must be aligned as altest via tblastx, then you should have been using proteins instead which will be widely available and more accurately aligned. Note that both proteins and altests are aligned in amino acid space, so you can expect anywhere from several million to hundreds of millions of years of divergence, and the species you use is not expected to be closely related (so whole proteomes will be available from a number of sources that will be far more accurate than any altest alignment). The only real benefit of altest is to provide evidence of lineage specific genes for organisms where there are no species in the same branch or phylum to get protein evidence from. Since there will only be a handful of these genes and they can be obtained in any later bootstrap training steps which will not involve est2genome or protein2genome models. You should use protein2genome models instead for the initial training and only use altest for a any bootstrap training or for your final models. Thanks, Carson On 9/1/14, 7:07 AM, "Marc H?ppner" wrote: >Hi, > >I may be wrong about this, but it seems to me that Maker will never build >a gene model from EST evidence, if the set data is provided as ?altest' >rather than ?est'. In my case, I am annotating a plant for which there is >a closely related reference genome + annotation, as well as pretty good >EST data. So I supplied the EST data as ?altest', assuming that the only >difference would be that the alignment parameters would be slightly more >relaxed. But I found that Maker never made any genome models from that >data. When moving the EST data to ?est?, it worked. > >So I am not sure whether this is an intended behaviour, but in my case it >caught me a bit by surprise? > >Regards, > >Marc >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 05:38:16 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 11:38:16 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:22:08 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:22:08 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 08:25:31 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 08:25:31 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 11:39:39 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 17:39:39 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: References: Message-ID: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > From dence at genetics.utah.edu Thu Sep 4 11:44:47 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 4 Sep 2014 17:44:47 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: , <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi Anindyajit, It doesn't look like the error output that you sent to Barry was forwarded with your message. Can you send that again? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Barry Moore [bmoore at genetics.utah.edu] Sent: Thursday, September 04, 2014 11:39 AM To: Anindyajit Banerjee Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fgenesh output to gff3 conversion Hi Anindyajit, I?m forwarding you message along to the maker mailing list and devel team? B On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > > Hi > > I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh > test out put file for your understanding > Please help > -- > Regards, > > Anindyajit Banerjee > Mobile: +919883333000. > > > > > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From MEC at stowers.org Thu Sep 4 12:10:14 2014 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 4 Sep 2014 18:10:14 +0000 Subject: [maker-devel] Fgenesh output to gff3 conversion In-Reply-To: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> References: <20794049-4C66-48FC-9003-847D3CC3F8C4@genetics.utah.edu> Message-ID: Hi, I'm not sure what maker offers in this regard. It's been some time since I've used it now. Anyway, if it helps, some time ago I wrote a quick fgenesh2gff using BioPerl. It is provided here. You need a bioperl installation. http://bio.perl.org/pipermail/bioperl-l/2006-July/022061.html ~Malcolm Cook >-----Original Message----- >From: maker-devel [mailto:maker-devel-bounces at yandell-lab.org] On Behalf Of Barry Moore >Sent: Thursday, September 04, 2014 12:40 PM >To: Anindyajit Banerjee >Cc: maker-devel at yandell-lab.org >Subject: Re: [maker-devel] Fgenesh output to gff3 conversion > >Hi Anindyajit, > >I'm forwarding you message along to the maker mailing list and devel team... > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the >further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby >attach a test output for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 12:45:15 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 18:45:15 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 12:52:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 12:52:20 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Thu Sep 4 13:01:57 2014 From: bmoore at genetics.utah.edu (Barry Moore) Date: Thu, 4 Sep 2014 19:01:57 +0000 Subject: [maker-devel] Fwd: Fgenesh output to gff3 conversion References: Message-ID: <77D4D576-9BAC-478D-8A0F-492225D71637@genetics.utah.edu> Attached is the document that Anindyajit set with his original question. B Begin forwarded message: From: Anindyajit Banerjee > Subject: Fgenesh output to gff3 conversion Date: September 4, 2014 at 8:37:26 AM MDT To: > Hi I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I am trying to convert the fgenesh output to gff3 format for the further input in EVM. However I am encountering the error while doing so. Could you suggest me any possible way to do so. I hereby attach a test output for fgenesh test out put file for your understanding Please help -- Regards, Anindyajit Banerjee Mobile: +919883333000. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fgenesh_output_test Type: application/octet-stream Size: 199696 bytes Desc: fgenesh_output_test URL: From carsonhh at gmail.com Thu Sep 4 13:06:28 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:06:28 -0600 Subject: [maker-devel] Fgenesh output to gff3 conversion Message-ID: MAKER can't convert the existing output, but you could use MAKER to run FGENESH for you instead. The results of which would be in GFF3. --Carson On 9/4/14, 11:39 AM, "Barry Moore" wrote: >Hi Anindyajit, > >I?m forwarding you message along to the maker mailing list and devel team? > >B > >On Sep 4, 2014, at 8:37 AM, Anindyajit Banerjee >wrote: > >> >> Hi >> >> I am Anindyajit Banerjee, a research scholar from CSIR-IICB, India. I >>am trying to convert the fgenesh output to gff3 format for the further >>input in EVM. However I am encountering the error while doing so. Could >>you suggest me any possible way to do so. I hereby attach a test output >>for fgenesh >> test out put file for your understanding >> Please help >> -- >> Regards, >> >> Anindyajit Banerjee >> Mobile: +919883333000. >> >> >> >> >> >> >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Thu Sep 4 13:24:06 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Thu, 4 Sep 2014 19:24:06 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 4 13:42:06 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 04 Sep 2014 13:42:06 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 08:43:19 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 14:43:19 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Hi, I finished running MAKER as suggested above. Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g options. I called the output file maker.gff3 In the maker.gff3 I found some invalid data (does not conform .gff3 format), e.g. ### 2 + ### OR ### .Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Targ et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + ### OR some gene (or mRNA) IDs are not uniq. This means they can be found multiple times with different values within the maker.gff3 How could it happen? As I understood, mRNA IDs in a .gff3 file must be uniq. Thanks Anh-Dao On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" wrote: >Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > >or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > >You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of maker-devel digest..." > > >Today's Topics: > > 1. Re: Maker_opts.ctl (Carson Holt) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 18 Jul 2014 11:04:09 -0600 >From: Carson Holt >To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel > Ence >Cc: "maker-devel at yandell-lab.org" >Subject: Re: [maker-devel] Maker_opts.ctl >Message-ID: >Content-Type: text/plain; charset="UTF-8" > >It should just be 'fgenesh'. If it's not there you can still just give >the GFF3. > >--Carson > > >On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>I am not sure which fgenesh executable file should I use. >> >>fgenesh= #location of fgenesh executable >> >>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you need >>to specify a list of other executable programs (such as ppd, ppdn+, etc) >> >>Anh-Dao >> >> >>On 7/16/14 3:32 PM, "Carson Holt" wrote: >> >>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>previous run. Then provide the RepeatModeler file to rmlib= >>> >>>--Carson >>> >>> >>> >>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>By default, model_org=all. Can I use the de novo repeat library >>>>predicted >>>>by RepeatModeler for the rmlib option? >>>> >>>>Anh-Dao >>>> >>>> >>>> >>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>> >>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>rmlib=. >>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>types >>>>>and use soft masking for some and hard masking for others. This >>>>>increases >>>>>sensitivity of evidence alignments while still maintaining >>>>>specificity. >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>I will run Augustus and FGENESH++ inside of MAKER using the parameter >>>>>>files for Augustus. >>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>using >>>>>>two >>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats via >>>>>>de >>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>of >>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>database. >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>> >>>>>>>When you ran Augustus separately, it should have created the >>>>>>>parameters >>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>using >>>>>>>the species name you just created. >>>>>>> >>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>than >>>>>>>giving it the results as GFF3. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>Thanks Daniel for your quick response. >>>>>>>> >>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>Augustus. >>>>>>>>I created the parameter file for the genome following their >>>>>>>>instructions. >>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>be >>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>Augustus >>>>>>>>has not trained that genome (no parameter file exists). Otherwise I >>>>>>>>would >>>>>>>>run Augustus inside MAKER. >>>>>>>> >>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>(sure >>>>>>>>I >>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>.gff3 >>>>>>>>files, separated by comma? >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>> >>>>>>>>>Hi Anh-Dao, >>>>>>>>> >>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>command >>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>files >>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>> >>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>options >>>>>>>>>in >>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>the >>>>>>>>>?pred_gff? option. >>>>>>>>> >>>>>>>>>If you have the RepeatMasker output in gff3 format you can give it >>>>>>>>>to >>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>> >>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>to >>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>gives >>>>>>>>>fasta >>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>the >>>>>>>>>other est fasta files. >>>>>>>>> >>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>can >>>>>>>>>use >>>>>>>>>another organism that augustus is already trained for. The list of >>>>>>>>>species that augustus has parameter files for is in the README.txt >>>>>>>>>that >>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>inside >>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>ext-based >>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>predictive ability. >>>>>>>>> >>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>another >>>>>>>>>organism?s parameter file? >>>>>>>>> >>>>>>>>>Thanks, >>>>>>>>>Daniel >>>>>>>>> >>>>>>>>> >>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>following >>>>>>>>>>data: >>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>options) >>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>> - proteins (fasta) >>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to convert >>>>>>>>>>to >>>>>>>>>>ZFF >>>>>>>>>>format, etc. ) >>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>convert >>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>for >>>>>>>>>>Augustus. >>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>> >>>>>>>>>> Could you please let me know how can I specify parameters in the >>>>>>>>>>maker_opts.ctl file? >>>>>>>>>> Or do you have other suggestions to re-do the data listed above? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> Anh-Dao >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> maker-devel mailing list >>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>> >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>maker-devel mailing list >>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab >>>>>>>>. >>>>>>>>o >>>>>>>>r >>>>>>>>g >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > > >------------------------------ > >Subject: Digest Footer > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > >------------------------------ > >End of maker-devel Digest, Vol 74, Issue 17 >******************************************* From carsonhh at gmail.com Fri Sep 5 09:37:02 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:37:02 -0600 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 Message-ID: The partial lines are symptoms of writing data to a slow NFS mounted drive. If NFS can't get a response for a write operation, it returns success (even though it wasn't really successful) and then continues to wait for the operation to really complete. This is called asynchronous writing. It improves performance by optimistically returning success on all operations rather than waiting to see if the operation really succeeded. If you have a slow or overloaded NFS mount though, you can get a number a failures and never any indication that they failed except for the fact that some files are missing content or lines are partial. When this happens, you need to run MAKER with the -a flag on fewer CPUs to rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can find which contigs have partial GFF3 lines, you can delete just those along with the datastore index log file and then launch maker without any flags to let it recompute just those contigs. Another possible cause is also NFS related. If you are running MAKER multiple times in the same working directory, and a slow NFS mount doesn't allow maker to properly lock files, then two maker jobs can try and compute the same contig simultaneously. Simultaneous writing of files can then cause IDs to be duplicated and some lines to be munged as lines from one process arrive to the file in the middle of lines from another process (creating a jumble of characters and partial lines). Start a singe maker job on fewer cpus using the -a flag to rebuild the GFF3 files if this is the case. Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are passing in GFF3 files with already assigned IDS (that may be used elsewhere). Are you using GFF3 pass-trough? Features that will not have unique ID= tags are CDS, three_prime_utr, and five_prime_utr features (these are considered non-continuous features because of the shared ID across lines). You can see examples here --> http://www.sequenceontology.org/gff3.shtml Also Name= attributes are not required to be unique. Thanks, Carson On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >Hi, > >I finished running MAKER as suggested above. >Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >options. I called the output file maker.gff3 > >In the maker.gff3 I found some invalid data (does not conform .gff3 >format), e.g. > >### >2 + >### > >OR > >### >.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Tar >g >et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >### > >OR some gene (or mRNA) IDs are not uniq. This means they can be found >multiple times with different values within the maker.gff3 > >How could it happen? As I understood, mRNA IDs in a .gff3 file must be >uniq. > >Thanks >Anh-Dao > > > > > >On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" > wrote: > >>Send maker-devel mailing list submissions to >> maker-devel at yandell-lab.org >> >>To subscribe or unsubscribe via the World Wide Web, visit >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >>or, via email, send a message with subject or body 'help' to >> maker-devel-request at yandell-lab.org >> >>You can reach the person managing the list at >> maker-devel-owner at yandell-lab.org >> >>When replying, please edit your Subject line so it is more specific >>than "Re: Contents of maker-devel digest..." >> >> >>Today's Topics: >> >> 1. Re: Maker_opts.ctl (Carson Holt) >> >> >>---------------------------------------------------------------------- >> >>Message: 1 >>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>From: Carson Holt >>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >> Ence >>Cc: "maker-devel at yandell-lab.org" >>Subject: Re: [maker-devel] Maker_opts.ctl >>Message-ID: >>Content-Type: text/plain; charset="UTF-8" >> >>It should just be 'fgenesh'. If it's not there you can still just give >>the GFF3. >> >>--Carson >> >> >>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>I am not sure which fgenesh executable file should I use. >>> >>>fgenesh= #location of fgenesh executable >>> >>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>need >>>to specify a list of other executable programs (such as ppd, ppdn+, etc) >>> >>>Anh-Dao >>> >>> >>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>> >>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>previous run. Then provide the RepeatModeler file to rmlib= >>>> >>>>--Carson >>>> >>>> >>>> >>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>predicted >>>>>by RepeatModeler for the rmlib option? >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>> >>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>> >>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>rmlib=. >>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>types >>>>>>and use soft masking for some and hard masking for others. This >>>>>>increases >>>>>>sensitivity of evidence alignments while still maintaining >>>>>>specificity. >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>parameter >>>>>>>files for Augustus. >>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>using >>>>>>>two >>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>via >>>>>>>de >>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>of >>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>database. >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>parameters >>>>>>>>needed to run it. Now you should be able to run it inside of MAKER >>>>>>>>using >>>>>>>>the species name you just created. >>>>>>>> >>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>than >>>>>>>>giving it the results as GFF3. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>> >>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>Augustus. >>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>instructions. >>>>>>>>>There were multiple steps to train and run Augustus (Creating gene >>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file will >>>>>>>>>be >>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>Augustus >>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>I >>>>>>>>>would >>>>>>>>>run Augustus inside MAKER. >>>>>>>>> >>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>(sure >>>>>>>>>I >>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>.gff3 >>>>>>>>>files, separated by comma? >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" wrote: >>>>>>>>> >>>>>>>>>>Hi Anh-Dao, >>>>>>>>>> >>>>>>>>>>In the maker_opts.ctl file, there are options for est and protein >>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>command >>>>>>>>>>separated list in the ?est" option, and all of your fasta protein >>>>>>>>>>files >>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>> >>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>options >>>>>>>>>>in >>>>>>>>>>the control file and pass the augustus and fgenesh predictions in >>>>>>>>>>the >>>>>>>>>>?pred_gff? option. >>>>>>>>>> >>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>it >>>>>>>>>>to >>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>> >>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give it >>>>>>>>>>to >>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>gives >>>>>>>>>>fasta >>>>>>>>>>output, so you would put that in the ?est? option, along with all >>>>>>>>>>the >>>>>>>>>>other est fasta files. >>>>>>>>>> >>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>can >>>>>>>>>>use >>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>of >>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>README.txt >>>>>>>>>>that >>>>>>>>>>came with Augustus. I really recommend that you run Augustus from >>>>>>>>>>inside >>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>ext-based >>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>predictive ability. >>>>>>>>>> >>>>>>>>>>When you ran the augustus gene prediction separately, did you use >>>>>>>>>>another >>>>>>>>>>organism?s parameter file? >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>following >>>>>>>>>>>data: >>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>options) >>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>> - proteins (fasta) >>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>convert >>>>>>>>>>>to >>>>>>>>>>>ZFF >>>>>>>>>>>format, etc. ) >>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>convert >>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>prediction separately, because the genome has never been trained >>>>>>>>>>>for >>>>>>>>>>>Augustus. >>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>> >>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>the >>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>above? >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> Anh-Dao >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>maker-devel mailing list >>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-la >>>>>>>>>b >>>>>>>>>. >>>>>>>>>o >>>>>>>>>r >>>>>>>>>g >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> >> >> >>------------------------------ >> >>Subject: Digest Footer >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >>------------------------------ >> >>End of maker-devel Digest, Vol 74, Issue 17 >>******************************************* > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Timothy.Stitt at tgac.ac.uk Fri Sep 5 01:58:59 2014 From: Timothy.Stitt at tgac.ac.uk (Timothy Stitt (TGAC)) Date: Fri, 5 Sep 2014 07:58:59 +0000 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Process' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt > Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal.pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt > Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" > Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 5 09:17:45 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 05 Sep 2014 09:17:45 -0600 Subject: [maker-devel] MAKER and large number of 'ps' processes In-Reply-To: References: Message-ID: I'm glad the work around is working for you. Proc::ProcessTable being faster than 'ps' is actually very very atypical. It is likely there is an issue with your system which is suggested by the fact 'ps' is hanging and accumulating processes which is also very atypical (ps should return in a fraction of a second). We actually switched from Proc::ProcessTable to 'ps' some time ago because 'ps' is several fold faster, and Proc::ProcessTable won't compile on about 10-15% of system architectures. Thanks, Carson From: "Timothy Stitt (TGAC)" Date: Friday, September 5, 2014 at 1:58 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. That seemed to do the trick! I'm now running our large case again and the 'ps' processes are definitely suppressed. On a very small test it looked like this new version completed quicker as well. I assume you would expect better performance from avoiding use of 'ps' and directly accessing the process table? Are there any disadvantages to this approach which is why it isn't default in the code? Much appreciated, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 20:42 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes I think I found what to do to get around the issue, since you are trying to force the use of 'Proc::ProcessTable' instead of using the systems 'ps'. Replace the get_proc_by_id subroutine in .../maker/lib/Proc/Signal.pm with the following one --> sub get_proc_by_id { my $id = shift; my $select; my $obj = new Proc::ProcessTable_simple; if(ref($obj) eq "Proc::ProcessTable"){ my ($p) = grep {$_->pid eq $id} @{$obj->table}; return $p; } else{ return $obj->get_proc_by_id($id); } } --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 1:24 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Sorry Carson. Not much luck with that either. I'm building afresh each time and then just running 'maker ?h' and the error appears. I meant to say I'm using ActivePerl v5.18.2. I'm assuming that shouldn't make any difference. Do you have any other suggestions to get the ProcessTable working directly? We are using 128 MPI processes for a large MAKER run and the 'ps' processes are overloading our servers. Cheers, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 19:52 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Try changing --> eval 'require Proc::ProcessTable'; to --> use Proc::ProcessTable; in .../maker/lib/Proc/ProcessTable_simple.pm. That way it forces the perls import method to run incase explicitly exports something for it to function properly. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 12:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Thanks Carson. I downloaded a couple of different versions of Proc::ProcessTable (v0.50 and v0.48). In each case they compiled successfully. I've copied snippets of the 'make test' below to confirm. I've scoured the source and build directories and don't see the .al files. Nothing seems to indicate that they are generated. I notice that the error occurs at line #143 in ../lib/Proc/Signal.pm of the MAKER source according to the diagnostics: #142 my $obj = new Proc::ProcessTable_simple; #143 return $obj->get_proc_by_id($id); Is there a possibility that the issue is caused by $obj not having the attribute that is being referenced in line $143? I'm not a Perl expert so just throwing out ideas here. If not, how do I get the *.al files to be generated if the build says everything built and tested ok? > make test make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' PERL_DL_NONLAZY=1 /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/process.t .. -------------------------------- uid: 10344 gid: 11995 ? cmndline: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static t/process.t exec: /tgac/software/testing/perl_activeperl/5.16.3.1603/x86_64/bin/perl-static cwd: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50 t/process.t .. ok All tests successful. Files=1, Tests=3, 0 wallclock secs ( 0.04 usr 0.02 sys + 0.08 cusr 0.07 csys = 0.21 CPU) Result: PASS make[1]: Entering directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' No tests defined for Proc::ProcessTable::Process extension. make[1]: Leaving directory `/usr/users/TGAC_ga007/stittt/Software/MAKER/UV/Proc-ProcessTable-0.50/Proce ss' Thanks, Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 4 September 2014 15:25 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes You can also try an older version from http://search.cpan.org if you think that is the issue, but I'd try checking the directories and installation locations first. --Carson From: Carson Holt Date: Thursday, September 4, 2014 at 8:22 AM To: "Timothy Stitt (TGAC)" , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes The error means Proc:ProcessTable didn't install and compile correctly. Any *.al files should be created during installation of Proc::ProcessTable. Go through these directories one at a time and check for the existence of ./Proc/ProcessTable.pm and then ./auto/Proc/ProcessTable/. If they are not there, then you installed Proc::ProcessTable somewhere else and you need to see what is wrong with your CPAN configuration. If they are there then you may need to manually delete both before attempting to reinstall. /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, September 4, 2014 at 5:38 AM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes Hi Carson, I tried the ?nolock option and it didn't have much effect. I then installed Proc:ProcessTable (which built successfully via cpan). Running MAKER though I get the following error: Can't locate auto/Proc/ProcessTable/get_proc_by.al in @INC (@INC contains: /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../perl/lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../src/inc/perl/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/site/lib /usr/users/TGAC_ga007/stittt/Software/Perl/5.18/lib .) at /usr/users/TGAC_ga007/stittt/Software/MAKER/UV/maker/bin/../lib/Proc/Signal. pm line 143. I looked within the directories of the ProcessTable build but I don't see the get_proc_by.al file. Should I be using an older version of ProcessTable? The one that was installed is v0.50. Thanks in advance for any further help with this. Tim. --- Timothy Stitt PhD / Head of Scientific Computing The Genome Analysis Centre (TGAC) http://www.tgac.ac.uk/ p: +44 1603 450378 e: timothy.stitt at tgac.ac.uk From: Carson Holt Date: Thursday, 21 August 2014 21:17 To: Timothy Stitt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] MAKER and large number of 'ps' processes MAKER uses 'ps' every so often to check on certain processes to make sure they haven't failed or become zombies. On your system these 'ps' calls may be hanging which would cause them to build up over time. You can try and run MAKER with the '-nolock' flag, since it is the NFS file locking that requires these process checks. Alternatively you can edit .../maker/lib/Proc/ProcessTable_simple.pm and change it as follows. Find the 'new' subroutine and change it from this --> sub new { if($PS){ my $self = {}; my $class = shift; bless($self, $class); return $self; } else{ eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } } to this --> sub new { eval 'require Proc::ProcessTable'; return Proc::ProcessTable->new(@_); } This will access the process table directly rather than through 'ps', but it may experience the same hang as 'ps' is experiencing. Also you will need to install 'Proc::ProcessTable' via CPAN for it to work, and that particular module may not install on some Linux systems. --Carson From: "Timothy Stitt (TGAC)" Date: Thursday, August 21, 2014 at 2:05 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] MAKER and large number of 'ps' processes Dear MAKER developers, One of my users is running MAKER on our large shared-memory SGI UV2000 system (with over 2000 cores) and the application appears to be generating large amounts of 'ps' processes that are overwhelming the system and causing the system to be unusable for other users. Can you confirm that MAKER would be generating this behaviour and if so, is there a way to prevent the application from running 'ps' repeatedly? Thanks in advance, Tim. ? Timothy Stitt PhD | Head of Scientific Computing +44 1603 450378 | timothy.stitt at tgac.ac.uk The Genome Analysis Centre (TGAC) Norwich Research Park, Norwich, NR4 7UH, UK | http://www.tgac.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Fri Sep 5 10:08:50 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Fri, 5 Sep 2014 16:08:50 +0000 Subject: [maker-devel] maker-devel Digest, Vol 74, Issue 17 In-Reply-To: References: Message-ID: Thanks Carson. I ran MAKER on 30 CPUs. I will re-run it using 10 CPUs. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > I submitted est_gff=cufflinks.gff3 and pred_gff=fgenesh.gff3 when running MAKER. However, I got 4 repeated mRNA ids as follows: augustus_masked-c206700011.Contig3-processed-gene-0.3 augustus_masked-c206700011.Contig3-processed-gene-0.3-mRNA-1 snap_masked-c206500027.Contig3-processed-gene-0.26 snap_masked-c206500027.Contig3-processed-gene-0.26-mRNA-1 Anh-Dao From Brian.Mack at ARS.USDA.GOV Mon Sep 8 07:47:01 2014 From: Brian.Mack at ARS.USDA.GOV (Mack, Brian) Date: Mon, 8 Sep 2014 13:47:01 +0000 Subject: [maker-devel] non-overlapping predictions Message-ID: Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set-td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Tue Sep 9 11:14:09 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Tue, 9 Sep 2014 17:14:09 +0000 Subject: [maker-devel] Non-canonical splice junctions Message-ID: <1410282848765.20893@uga.edu> Hi, Is it possible to force MAKER to predict gene models only with canonical splice sites? Or is a quick way to identify gene models with non-canonical splice sites? Thanks, Ranjani -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 9 16:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 09 Sep 2014 16:09:13 -0600 Subject: [maker-devel] non-overlapping predictions Message-ID: It's a naming issue. The reference match/match_part features have 'abinit' in the name while the non-overlapping fasta file has 'processed' in the name of the fasta header. The easiest way to fix it is to just replaced 'processed' with 'abinit' in the terms you are searching for. This was supposed to be resolved already, but I'll see what's going on. What version of MAKER are you using? --Carson From: "Mack, Brian" Date: Monday, September 8, 2014 at 7:47 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] non-overlapping predictions Hi, I used IPRscan on the non-overlapping ab initio proteins and identified additional predictions that I want to include in my final gff. I was trying to follow the advice given in this thread (http://gmod.827538.n3.nabble.com/Adding-non-overlapping-models-to-final-set -td4043778.html) to pull out these predictions from the full Maker gff3 that includes all the evidence, but it seems that none of the non-overlapping predictions are in this gff3 file. Where would I find all of the predictions including the non-overlapping predictions? Thanks, Brian This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nguyenan at mail.nih.gov Thu Sep 18 05:49:45 2014 From: nguyenan at mail.nih.gov (Nguyen, Anh-Dao (NIH/NHGRI) [C]) Date: Thu, 18 Sep 2014 11:49:45 +0000 Subject: [maker-devel] CPUs problems Message-ID: I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I checked the log file and got these errors: Processing run.log file... examining contents of the fasta file and run log shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory Can you let me know how can I fix the problem? Thanks Anh-Dao On 9/5/14 11:37 AM, "Carson Holt" wrote: >The partial lines are symptoms of writing data to a slow NFS mounted >drive. If NFS can't get a response for a write operation, it returns >success (even though it wasn't really successful) and then continues to >wait for the operation to really complete. This is called asynchronous >writing. It improves performance by optimistically returning success on >all operations rather than waiting to see if the operation really >succeeded. If you have a slow or overloaded NFS mount though, you can get >a number a failures and never any indication that they failed except for >the fact that some files are missing content or lines are partial. > >When this happens, you need to run MAKER with the -a flag on fewer CPUs to >rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >find which contigs have partial GFF3 lines, you can delete just those >along with the datastore index log file and then launch maker without any >flags to let it recompute just those contigs. > >Another possible cause is also NFS related. If you are running MAKER >multiple times in the same working directory, and a slow NFS mount doesn't >allow maker to properly lock files, then two maker jobs can try and >compute the same contig simultaneously. Simultaneous writing of files can >then cause IDs to be duplicated and some lines to be munged as lines from >one process arrive to the file in the middle of lines from another process >(creating a jumble of characters and partial lines). Start a singe maker >job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >the case. > >Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you are >passing in GFF3 files with already assigned IDS (that may be used >elsewhere). Are you using GFF3 pass-trough? > >Features that will not have unique ID= tags are CDS, three_prime_utr, and >five_prime_utr features (these are considered non-continuous features >because of the shared ID across lines). >You can see examples here --> http://www.sequenceontology.org/gff3.shtml > >Also Name= attributes are not required to be unique. > >Thanks, >Carson > > > > > > >On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" > wrote: > >>Hi, >> >>I finished running MAKER as suggested above. >>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>options. I called the output file maker.gff3 >> >>In the maker.gff3 I found some invalid data (does not conform .gff3 >>format), e.g. >> >>### >>2 + >>### >> >>OR >> >>### >>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;Ta >>r >>g >>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>### >> >>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>multiple times with different values within the maker.gff3 >> >>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>uniq. >> >>Thanks >>Anh-Dao >> >> >> >> >> >>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >> wrote: >> >>>Send maker-devel mailing list submissions to >>> maker-devel at yandell-lab.org >>> >>>To subscribe or unsubscribe via the World Wide Web, visit >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>>or, via email, send a message with subject or body 'help' to >>> maker-devel-request at yandell-lab.org >>> >>>You can reach the person managing the list at >>> maker-devel-owner at yandell-lab.org >>> >>>When replying, please edit your Subject line so it is more specific >>>than "Re: Contents of maker-devel digest..." >>> >>> >>>Today's Topics: >>> >>> 1. Re: Maker_opts.ctl (Carson Holt) >>> >>> >>>---------------------------------------------------------------------- >>> >>>Message: 1 >>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>From: Carson Holt >>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>> Ence >>>Cc: "maker-devel at yandell-lab.org" >>>Subject: Re: [maker-devel] Maker_opts.ctl >>>Message-ID: >>>Content-Type: text/plain; charset="UTF-8" >>> >>>It should just be 'fgenesh'. If it's not there you can still just give >>>the GFF3. >>> >>>--Carson >>> >>> >>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>> wrote: >>> >>>>I am not sure which fgenesh executable file should I use. >>>> >>>>fgenesh= #location of fgenesh executable >>>> >>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>need >>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>etc) >>>> >>>>Anh-Dao >>>> >>>> >>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>> >>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like your >>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>> >>>>>--Carson >>>>> >>>>> >>>>> >>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>> wrote: >>>>> >>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>predicted >>>>>>by RepeatModeler for the rmlib option? >>>>>> >>>>>>Anh-Dao >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>> >>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>rmlib=. >>>>>>> By letting MAKER handle repeat masking it will differentiate repeat >>>>>>>types >>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>increases >>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>specificity. >>>>>>> >>>>>>>--Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>> wrote: >>>>>>> >>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>parameter >>>>>>>>files for Augustus. >>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>using >>>>>>>>two >>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>via >>>>>>>>de >>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM inside >>>>>>>>of >>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>database. >>>>>>>> >>>>>>>>Anh-Dao >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>> >>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>parameters >>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>MAKER >>>>>>>>>using >>>>>>>>>the species name you just created. >>>>>>>>> >>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>than >>>>>>>>>giving it the results as GFF3. >>>>>>>>> >>>>>>>>>--Carson >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>> >>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>Augustus. >>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>instructions. >>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>gene >>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>will >>>>>>>>>>be >>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>Augustus >>>>>>>>>>has not trained that genome (no parameter file exists). Otherwise >>>>>>>>>>I >>>>>>>>>>would >>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>> >>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker output >>>>>>>>>>(sure >>>>>>>>>>I >>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>.gff3 >>>>>>>>>>files, separated by comma? >>>>>>>>>> >>>>>>>>>>Anh-Dao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>> >>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>protein >>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>command >>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>protein >>>>>>>>>>>files >>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>> >>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>options >>>>>>>>>>>in >>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>in >>>>>>>>>>>the >>>>>>>>>>>?pred_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>> >>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>it >>>>>>>>>>>to >>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>gives >>>>>>>>>>>fasta >>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>all >>>>>>>>>>>the >>>>>>>>>>>other est fasta files. >>>>>>>>>>> >>>>>>>>>>>If Augustus isn?t trained for your particular organism, then you >>>>>>>>>>>can >>>>>>>>>>>use >>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>of >>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>README.txt >>>>>>>>>>>that >>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>from >>>>>>>>>>>inside >>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>ext-based >>>>>>>>>>>hints to augustus at runtime, which can really improve Augustus? >>>>>>>>>>>predictive ability. >>>>>>>>>>> >>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>use >>>>>>>>>>>another >>>>>>>>>>>organism?s parameter file? >>>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>following >>>>>>>>>>>>data: >>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>options) >>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>convert >>>>>>>>>>>>to >>>>>>>>>>>>ZFF >>>>>>>>>>>>format, etc. ) >>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>convert >>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>trained >>>>>>>>>>>>for >>>>>>>>>>>>Augustus. >>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>> >>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>the >>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>above? >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Anh-Dao >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>> >>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell >>>>>>>>>>>>- >>>>>>>>>>>>l >>>>>>>>>>>>a >>>>>>>>>>>>b >>>>>>>>>>>>. >>>>>>>>>>>>o >>>>>>>>>>>>r >>>>>>>>>>>>g >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>maker-devel mailing list >>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-l >>>>>>>>>>a >>>>>>>>>>b >>>>>>>>>>. >>>>>>>>>>o >>>>>>>>>>r >>>>>>>>>>g >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >>>------------------------------ >>> >>>Subject: Digest Footer >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>>------------------------------ >>> >>>End of maker-devel Digest, Vol 74, Issue 17 >>>******************************************* >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri Sep 19 11:22:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Sep 2014 11:22:50 -0600 Subject: [maker-devel] CPUs problems In-Reply-To: References: Message-ID: These are further symptoms of an IO related issue. The script cannot even query it's current working directory. Check to make sure there is plenty of space in the temporary directory /tmp. If /tmp is separately mounted on each machine there may be one that is full. Also make sure you did not set TMP= in the maker_opts.ctl file to an NFS mounted location. Do you by any chance get any warnings when you start MAKER. For example --> WARNING: Multiple MAKER processes have been started in the same directory. That would indicate that the MPI communication rung is down which would drastically increase IO operations. You may also have one or more nodes that are having the issue and are the source of all the errors. If you are using OpenMPI to run MAKER, you can tag the output from each node using the --tag-output flag for mpiexec. Then if the same node is always producing the error, you can have IT look at it. Also MAKER is set to automatically retry on errors. If all contigs are finished, check the output. Make sure there are the same number of genes in the fasta files and GFF3 files. Also look for munged content. If everything looks ok, MAKER may have gotten around the issue through shear brute force (I.e. it retried until it succeeded). --Carson On 9/18/14, 5:49 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" wrote: >I re-ran maker on 10 CPUs. The maker job was finished after 10 days. I >checked the log file and got these errors: > >Processing run.log file... >examining contents of the fasta file and run log >shell-init: error retrieving current directory: getcwd: cannot access >parent directories: No such file or directory > > >Can you let me know how can I fix the problem? > >Thanks >Anh-Dao > > >On 9/5/14 11:37 AM, "Carson Holt" wrote: > >>The partial lines are symptoms of writing data to a slow NFS mounted >>drive. If NFS can't get a response for a write operation, it returns >>success (even though it wasn't really successful) and then continues to >>wait for the operation to really complete. This is called asynchronous >>writing. It improves performance by optimistically returning success on >>all operations rather than waiting to see if the operation really >>succeeded. If you have a slow or overloaded NFS mount though, you can get >>a number a failures and never any indication that they failed except for >>the fact that some files are missing content or lines are partial. >> >>When this happens, you need to run MAKER with the -a flag on fewer CPUs >>to >>rebuild the GFF3 files. Fewer CPUs reduces the IO burden. Or if you can >>find which contigs have partial GFF3 lines, you can delete just those >>along with the datastore index log file and then launch maker without any >>flags to let it recompute just those contigs. >> >>Another possible cause is also NFS related. If you are running MAKER >>multiple times in the same working directory, and a slow NFS mount >>doesn't >>allow maker to properly lock files, then two maker jobs can try and >>compute the same contig simultaneously. Simultaneous writing of files >>can >>then cause IDs to be duplicated and some lines to be munged as lines from >>one process arrive to the file in the middle of lines from another >>process >>(creating a jumble of characters and partial lines). Start a singe maker >>job on fewer cpus using the -a flag to rebuild the GFF3 files if this is >>the case. >> >>Repeated gene/mRNA IDs can also be caused by gff3_passthrough when you >>are >>passing in GFF3 files with already assigned IDS (that may be used >>elsewhere). Are you using GFF3 pass-trough? >> >>Features that will not have unique ID= tags are CDS, three_prime_utr, and >>five_prime_utr features (these are considered non-continuous features >>because of the shared ID across lines). >>You can see examples here --> http://www.sequenceontology.org/gff3.shtml >> >>Also Name= attributes are not required to be unique. >> >>Thanks, >>Carson >> >> >> >> >> >> >>On 9/5/14, 8:43 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >> wrote: >> >>>Hi, >>> >>>I finished running MAKER as suggested above. >>>Then I ran gff3_merge.pl to retrieve only MAKER annotation using -n -g >>>options. I called the output file maker.gff3 >>> >>>In the maker.gff3 I found some invalid data (does not conform .gff3 >>>format), e.g. >>> >>>### >>>2 + >>>### >>> >>>OR >>> >>>### >>>.Contig1:hsp:72378:1.3.0.0;Parent=c209800247.Contig1:hit:30214:1.3.0.0;T >>>a >>>r >>>g >>>et=species:tRNA-Asn-AAC|genus:tRNA 1 75 + >>>### >>> >>>OR some gene (or mRNA) IDs are not uniq. This means they can be found >>>multiple times with different values within the maker.gff3 >>> >>>How could it happen? As I understood, mRNA IDs in a .gff3 file must be >>>uniq. >>> >>>Thanks >>>Anh-Dao >>> >>> >>> >>> >>> >>>On 7/18/14 2:00 PM, "maker-devel-request at yandell-lab.org" >>> wrote: >>> >>>>Send maker-devel mailing list submissions to >>>> maker-devel at yandell-lab.org >>>> >>>>To subscribe or unsubscribe via the World Wide Web, visit >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>g >>>> >>>>or, via email, send a message with subject or body 'help' to >>>> maker-devel-request at yandell-lab.org >>>> >>>>You can reach the person managing the list at >>>> maker-devel-owner at yandell-lab.org >>>> >>>>When replying, please edit your Subject line so it is more specific >>>>than "Re: Contents of maker-devel digest..." >>>> >>>> >>>>Today's Topics: >>>> >>>> 1. Re: Maker_opts.ctl (Carson Holt) >>>> >>>> >>>>---------------------------------------------------------------------- >>>> >>>>Message: 1 >>>>Date: Fri, 18 Jul 2014 11:04:09 -0600 >>>>From: Carson Holt >>>>To: "Nguyen, Anh-Dao (NIH/NHGRI) [C]" , Daniel >>>> Ence >>>>Cc: "maker-devel at yandell-lab.org" >>>>Subject: Re: [maker-devel] Maker_opts.ctl >>>>Message-ID: >>>>Content-Type: text/plain; charset="UTF-8" >>>> >>>>It should just be 'fgenesh'. If it's not there you can still just give >>>>the GFF3. >>>> >>>>--Carson >>>> >>>> >>>>On 7/17/14, 8:19 AM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>> wrote: >>>> >>>>>I am not sure which fgenesh executable file should I use. >>>>> >>>>>fgenesh= #location of fgenesh executable >>>>> >>>>>When I run FGENESH++, I need to run the run_pipe.pl script. Sure you >>>>>need >>>>>to specify a list of other executable programs (such as ppd, ppdn+, >>>>>etc) >>>>> >>>>>Anh-Dao >>>>> >>>>> >>>>>On 7/16/14 3:32 PM, "Carson Holt" wrote: >>>>> >>>>>>'all' will use the whole of RepBase, or you can do 'metazoa' like >>>>>>your >>>>>>previous run. Then provide the RepeatModeler file to rmlib= >>>>>> >>>>>>--Carson >>>>>> >>>>>> >>>>>> >>>>>>On 7/16/14, 1:28 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>> wrote: >>>>>> >>>>>>>By default, model_org=all. Can I use the de novo repeat library >>>>>>>predicted >>>>>>>by RepeatModeler for the rmlib option? >>>>>>> >>>>>>>Anh-Dao >>>>>>> >>>>>>> >>>>>>> >>>>>>>On 7/16/14 3:17 PM, "Carson Holt" wrote: >>>>>>> >>>>>>>>No. You can provide both to MAKER. The options are model_org= and >>>>>>>>rmlib=. >>>>>>>> By letting MAKER handle repeat masking it will differentiate >>>>>>>>repeat >>>>>>>>types >>>>>>>>and use soft masking for some and hard masking for others. This >>>>>>>>increases >>>>>>>>sensitivity of evidence alignments while still maintaining >>>>>>>>specificity. >>>>>>>> >>>>>>>>--Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>On 7/16/14, 1:07 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>> wrote: >>>>>>>> >>>>>>>>>I will run Augustus and FGENESH++ inside of MAKER using the >>>>>>>>>parameter >>>>>>>>>files for Augustus. >>>>>>>>>I could also run RepeatMasker inside of MAKER. However, I ran RM >>>>>>>>>using >>>>>>>>>two >>>>>>>>>options: -lib (de novo) and -species (known). I got ~ 45% repeats >>>>>>>>>via >>>>>>>>>de >>>>>>>>>novo and ~ 4% repeats via known options. As I understood, RM >>>>>>>>>inside >>>>>>>>>of >>>>>>>>>MAKER uses only RepBase repeat library and RepeatRunner protein >>>>>>>>>database. >>>>>>>>> >>>>>>>>>Anh-Dao >>>>>>>>> >>>>>>>>> >>>>>>>>>On 7/16/14 2:36 PM, "Carson Holt" wrote: >>>>>>>>> >>>>>>>>>>When you ran Augustus separately, it should have created the >>>>>>>>>>parameters >>>>>>>>>>needed to run it. Now you should be able to run it inside of >>>>>>>>>>MAKER >>>>>>>>>>using >>>>>>>>>>the species name you just created. >>>>>>>>>> >>>>>>>>>>I'd also recommend letting MAKER run RepeatMasker for you rather >>>>>>>>>>than >>>>>>>>>>giving it the results as GFF3. >>>>>>>>>> >>>>>>>>>>--Carson >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>On 7/16/14, 12:30 PM, "Nguyen, Anh-Dao (NIH/NHGRI) [C]" >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>Thanks Daniel for your quick response. >>>>>>>>>>> >>>>>>>>>>>I did not use the parameter file of other organism when running >>>>>>>>>>>Augustus. >>>>>>>>>>>I created the parameter file for the genome following their >>>>>>>>>>>instructions. >>>>>>>>>>>There were multiple steps to train and run Augustus (Creating >>>>>>>>>>>gene >>>>>>>>>>>structures for training AUGUSTUS with CEGMA => parameter file >>>>>>>>>>>will >>>>>>>>>>>be >>>>>>>>>>>created; Creating Hints for AUGUSTUS from ESTs/cDNA sequences; >>>>>>>>>>>Incorporating Illumina RNAseq into AUGUSTUS with GSNAP, etc.) >>>>>>>>>>>As I mentioned the reason why I ran Augustus separately, because >>>>>>>>>>>Augustus >>>>>>>>>>>has not trained that genome (no parameter file exists). >>>>>>>>>>>Otherwise >>>>>>>>>>>I >>>>>>>>>>>would >>>>>>>>>>>run Augustus inside MAKER. >>>>>>>>>>> >>>>>>>>>>>You suggested to use rm_gff option to specify RepeatMasker >>>>>>>>>>>output >>>>>>>>>>>(sure >>>>>>>>>>>I >>>>>>>>>>>will convert them to .gff3 formatted files). Can I submit two RM >>>>>>>>>>>.gff3 >>>>>>>>>>>files, separated by comma? >>>>>>>>>>> >>>>>>>>>>>Anh-Dao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>On 7/16/14 2:13 PM, "Daniel Ence" >>>>>>>>>>>wrote: >>>>>>>>>>> >>>>>>>>>>>>Hi Anh-Dao, >>>>>>>>>>>> >>>>>>>>>>>>In the maker_opts.ctl file, there are options for est and >>>>>>>>>>>>protein >>>>>>>>>>>>evidence. You?ll put all of your fasta est files together in a >>>>>>>>>>>>command >>>>>>>>>>>>separated list in the ?est" option, and all of your fasta >>>>>>>>>>>>protein >>>>>>>>>>>>files >>>>>>>>>>>>in a command separated list for the ?protein? option. >>>>>>>>>>>> >>>>>>>>>>>>You?ll specify the SNAP and Genemark files in their respective >>>>>>>>>>>>options >>>>>>>>>>>>in >>>>>>>>>>>>the control file and pass the augustus and fgenesh predictions >>>>>>>>>>>>in >>>>>>>>>>>>the >>>>>>>>>>>>?pred_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you have the RepeatMasker output in gff3 format you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?rm_gff? option. >>>>>>>>>>>> >>>>>>>>>>>>If you?ve converted the cufflinks output to gff3, you can give >>>>>>>>>>>>it >>>>>>>>>>>>to >>>>>>>>>>>>maker with the ?est_gff? option. I?m pretty sure Trinity only >>>>>>>>>>>>gives >>>>>>>>>>>>fasta >>>>>>>>>>>>output, so you would put that in the ?est? option, along with >>>>>>>>>>>>all >>>>>>>>>>>>the >>>>>>>>>>>>other est fasta files. >>>>>>>>>>>> >>>>>>>>>>>>If Augustus isn?t trained for your particular organism, then >>>>>>>>>>>>you >>>>>>>>>>>>can >>>>>>>>>>>>use >>>>>>>>>>>>another organism that augustus is already trained for. The list >>>>>>>>>>>>of >>>>>>>>>>>>species that augustus has parameter files for is in the >>>>>>>>>>>>README.txt >>>>>>>>>>>>that >>>>>>>>>>>>came with Augustus. I really recommend that you run Augustus >>>>>>>>>>>>from >>>>>>>>>>>>inside >>>>>>>>>>>>maker, because then you get all the benefits of maker passing >>>>>>>>>>>>ext-based >>>>>>>>>>>>hints to augustus at runtime, which can really improve >>>>>>>>>>>>Augustus? >>>>>>>>>>>>predictive ability. >>>>>>>>>>>> >>>>>>>>>>>>When you ran the augustus gene prediction separately, did you >>>>>>>>>>>>use >>>>>>>>>>>>another >>>>>>>>>>>>organism?s parameter file? >>>>>>>>>>>> >>>>>>>>>>>>Thanks, >>>>>>>>>>>>Daniel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Jul 16, 2014, at 11:15 AM, Nguyen, Anh-Dao (NIH/NHGRI) [C] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to conduct a genome annotation and have the >>>>>>>>>>>>>following >>>>>>>>>>>>>data: >>>>>>>>>>>>> - Two separate RepeatMasker outputs (using -lib and -species >>>>>>>>>>>>>options) >>>>>>>>>>>>> - ESTs and RACE (fasta) >>>>>>>>>>>>> - proteins (fasta) >>>>>>>>>>>>> - proteins of related organisms (fasta) >>>>>>>>>>>>> - SNAP's .hmm file (ran CEGMA, then used cegma2zff.pl to >>>>>>>>>>>>>convert >>>>>>>>>>>>>to >>>>>>>>>>>>>ZFF >>>>>>>>>>>>>format, etc. ) >>>>>>>>>>>>> - GeneMark's .hmm file (es.mod file from running gm_es.pl) >>>>>>>>>>>>> - FGENESH++ and Augustus gene predictions. I wrote scripts to >>>>>>>>>>>>>convert >>>>>>>>>>>>>the outputs to .gff3 files. The reason why I ran Augustus gene >>>>>>>>>>>>>prediction separately, because the genome has never been >>>>>>>>>>>>>trained >>>>>>>>>>>>>for >>>>>>>>>>>>>Augustus. >>>>>>>>>>>>> - Cufflinks and Trinity from RNA-Seq >>>>>>>>>>>>> >>>>>>>>>>>>> Could you please let me know how can I specify parameters in >>>>>>>>>>>>>the >>>>>>>>>>>>>maker_opts.ctl file? >>>>>>>>>>>>> Or do you have other suggestions to re-do the data listed >>>>>>>>>>>>>above? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Anh-Dao >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> maker-devel mailing list >>>>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>>>> >>>>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandel >>>>>>>>>>>>>l >>>>>>>>>>>>>- >>>>>>>>>>>>>l >>>>>>>>>>>>>a >>>>>>>>>>>>>b >>>>>>>>>>>>>. >>>>>>>>>>>>>o >>>>>>>>>>>>>r >>>>>>>>>>>>>g >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>maker-devel mailing list >>>>>>>>>>>maker-devel at box290.bluehost.com >>>>>>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell- >>>>>>>>>>>l >>>>>>>>>>>a >>>>>>>>>>>b >>>>>>>>>>>. >>>>>>>>>>>o >>>>>>>>>>>r >>>>>>>>>>>g >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>>------------------------------ >>>> >>>>Subject: Digest Footer >>>> >>>>_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>>------------------------------ >>>> >>>>End of maker-devel Digest, Vol 74, Issue 17 >>>>******************************************* >>> >>> >>>_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carson.holt at genetics.utah.edu Mon Sep 22 14:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 22 Sep 2014 20:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <541BCE0A.70806@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> Message-ID: The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > From myandell at genetics.utah.edu Mon Sep 22 18:10:38 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 23 Sep 2014 00:10:38 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: References: <541BCE0A.70806@env.ethz.ch>, Message-ID: <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> Also are you numbers including the ab-inito predictions without evidence that have pfamm domains? cheers, --mark Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carson.holt at genetics.utah.edu] Sent: Monday, September 22, 2014 2:17 PM To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome The contiged assembly is more likely to give spurious hits and alignments. They also can be harder to repeat mask. Also gene predictors can behave slightly different on small sequences than on longer ones. If you have fewer gene models than you expect, your first step should be to process the scaffolds with CEGMA. It will give you an estimate of the genomes "completeness". If CEGMA gives a 60% completeness value for example then you can expect to only recover 60% of the expected number of genes. Next you should run RepeatModeler of similar software to help generate a species specific repeat library. Under masked repeats can make predicting genes on longer scaffolds far more difficult for ab initio predictors. --Carson On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >Hi, > >I am working on the annotation of a plant genome (about 600MB) and we >have a reasonable draft assembly, a fairly good transcriptome and quite >a few proteins from related species. We have also extensively trained >augustus and are also feeding genmark and snap predictions. > >Recently I noticed a behavior of Maker that seems fairly odd and which I >cannot explain at all. When I take the scaffolded genome (about 23000 >scaffolds) I get roughly 9'000 maker approved gene models. Which is >admittedly a bit on the low side and we have to work on this. However, >when I break up the scaffolds into contigs at stretches of N longer >500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >obviously 17'000 is more in the range what I would expect, so I am >inclined to go with these. I have looked at both annotations and the >evidence in WebApollo and the evidence alignments are identical for both >runs. The approved genes seem to be the same, except for the additional >ones in the "contiged" genome version. The additional gene models are >not necessarily at the ends of the contigs, so I think it has nothing to >do with having the stretches of Ns nearby in the scaffolded genome. Do >you have any idea why maker comes up with the additional numbers of gene >models and how I could "convince" maker to give me the same gene models >for the scaffolded assembly? > >Cheers, >Stefan > > > >-- >Stefan Zoller, PhD >Bioinformatics >Genetic Diversity Centre >ETH Zurich CHN E55.1 >Universit?tsstrasse 16 >8092 Zurich >Switzerland > >Phone: +41 44 632 66 85 >E-Mail: stefan.zoller at env.ethz.ch >Web: www.gdc.ethz.ch > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From aksrao at ucdavis.edu Thu Sep 25 10:18:30 2014 From: aksrao at ucdavis.edu (Anand K S Rao) Date: Thu, 25 Sep 2014 09:18:30 -0700 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Message-ID: Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Sep 25 12:17:19 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 25 Sep 2014 18:17:19 +0000 Subject: [maker-devel] diff. numbers of geneson contigs vs. scaffolded genome In-Reply-To: <5421695F.5040409@env.ethz.ch> References: <541BCE0A.70806@env.ethz.ch> <7A60AB257EFF2B48B1F4C814817EA0537B651ADF@mxb1.hg.genetics.utah.edu> <5421695F.5040409@env.ethz.ch> Message-ID: Sorry for the slow reply. I was trying to locate a script that might be useful for you. I think a species specific repeat libary will be of most benefit here (it's surprising just how influential this step is). Also note that you should train SNAP and Augustus on your species and are not just using another related species as a stand in. With respect to PFAM domains, on some organisms you may not get a lot of cross species protein alignments because of divergence or assembly issues. This of course makes it harder to support these models with direct protein alignments. However you can run InterProscan over the non-overlapping.proteins.fasta file produced by MAKER (contains non-redundant rejected models). Because an HMM is used for domain identification, it can pick up protein domains that would not produce a significant BLAST alignment because of divergence. You can then add models with positive hits for protein domains back into your gene set. This ad hoc procedure usually can only increase gene counts by about 10% though for organisms where it's required. I've attached a script that makes generating results for these genes easier. 1. First you run InterProScan with just PFAM. 2. Then you take the IDs of all models that have a domain in the report and create a list (1 ID per line). 3. Next use the fasta_tool script that comes with MAKER together with the --select flag to separate just the positive hits (ID's in your list) from the non-overlapping.proteins.fasta and non-overlapping.transscripts.fasta files. 4. Use the attached script to separate just the positive hits (your ID list) from the GFF3. The script will upgrade match/match_part results to gene/mRNA/exon/CDS results and print them out for you. 5. Use the fasta_maerge and gff3_merge scripts that come with MAKER to merge the selected/upgraded GFF3 entries and selected FASTA entries back into the original MAKER results. --Carson On 9/23/14, 6:36 AM, "Stefan Zoller" wrote: >Please forgive my ignorance, I am not entirely sure if I understand your >question correctly, but I will try to answer. >As evidence we use: >1) our own transcriptome (trinity assembled RNAseq, filtering out the >very low expression transcripts). >2) all swissprot plant proteins, and several protein sets from closely >related plant species downloaded from NCBI. >I am not sure if the ab-initio predictions without evidence have pfamm >domains. Honestly, I would not know how to tell and how to >include/exclude. >I was assuming that we should not have too many Maker approved >predictions without evidence anyway, because we use "keeps_preds=0". >The numbers of gene predictions I mentioned in my email are the >predictions reported by the fasta_merge/gff3_merge scripts in the >"*maker.proteins.fasta". There are of course many more predictions in >e.g., "*maker.augustus_masked.proteins.fasta" (about 68'000 in this file). > >I hope I am not totally off with my answer. >Cheers, Stefan > > > >On 23.09.14 02:10, Mark Yandell wrote: >> Also are you numbers including the ab-inito predictions without >>evidence that have pfamm domains? >> >> cheers, >> >> >> --mark >> >> >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Co-director USTAR Center for Genetic Discovery >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of >>Carson Holt [carson.holt at genetics.utah.edu] >> Sent: Monday, September 22, 2014 2:17 PM >> To: stefan.zoller at env.ethz.ch; maker-devel at yandell-lab.org >> Subject: Re: [maker-devel] diff. numbers of geneson contigs vs. >>scaffolded genome >> >> The contiged assembly is more likely to give spurious hits and >>alignments. >> They also can be harder to repeat mask. Also gene predictors can >>behave >> slightly different on small sequences than on longer ones. If you have >> fewer gene models than you expect, your first step should be to process >> the scaffolds with CEGMA. It will give you an estimate of the genomes >> "completeness". If CEGMA gives a 60% completeness value for example >>then >> you can expect to only recover 60% of the expected number of genes. Next >> you should run RepeatModeler of similar software to help generate a >> species specific repeat library. Under masked repeats can make >>predicting >> genes on longer scaffolds far more difficult for ab initio predictors. >> >> --Carson >> >> >> On 9/19/14, 12:32 AM, "Stefan Zoller" wrote: >> >>> Hi, >>> >>> I am working on the annotation of a plant genome (about 600MB) and we >>> have a reasonable draft assembly, a fairly good transcriptome and quite >>> a few proteins from related species. We have also extensively trained >>> augustus and are also feeding genmark and snap predictions. >>> >>> Recently I noticed a behavior of Maker that seems fairly odd and which >>>I >>> cannot explain at all. When I take the scaffolded genome (about 23000 >>> scaffolds) I get roughly 9'000 maker approved gene models. Which is >>> admittedly a bit on the low side and we have to work on this. However, >>> when I break up the scaffolds into contigs at stretches of N longer >>> 500bp (about 60'000 contigs) I get about 17'000 maker gene models. Now >>> obviously 17'000 is more in the range what I would expect, so I am >>> inclined to go with these. I have looked at both annotations and the >>> evidence in WebApollo and the evidence alignments are identical for >>>both >>> runs. The approved genes seem to be the same, except for the additional >>> ones in the "contiged" genome version. The additional gene models are >>> not necessarily at the ends of the contigs, so I think it has nothing >>>to >>> do with having the stretches of Ns nearby in the scaffolded genome. Do >>> you have any idea why maker comes up with the additional numbers of >>>gene >>> models and how I could "convince" maker to give me the same gene models >>> for the scaffolded assembly? >>> >>> Cheers, >>> Stefan >>> >>> >>> >>> -- >>> Stefan Zoller, PhD >>> Bioinformatics >>> Genetic Diversity Centre >>> ETH Zurich CHN E55.1 >>> Universit?tsstrasse 16 >>> 8092 Zurich >>> Switzerland >>> >>> Phone: +41 44 632 66 85 >>> E-Mail: stefan.zoller at env.ethz.ch >>> Web: www.gdc.ethz.ch >>> >>> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_preds2models Type: application/octet-stream Size: 5523 bytes Desc: gff3_preds2models URL: From carsonhh at gmail.com Thu Sep 25 12:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 25 Sep 2014 12:43:35 -0600 Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? In-Reply-To: References: Message-ID: When you say "gene-like structures:, are you saying that you are looking for pseudogenes and non-coding genes? You can use the trnascan and snoscan options in the maker_opts.ctl file to find some non-coding RNAS. You may just want to leave off all ab initio gene predictors like SNAP and Augustus as those will be looking for canonical coding genes. If you first hard mask any coding genes, and then provide ESTs or assembled mRNA-seq and proteins, you may be able to use the exonerate alignments produced to identify potential gene like structures. It might require a little post processing of the resulting GFF3 by you. Thanks, Carson From: Anand K S Rao Date: Thursday, September 25, 2014 at 10:18 AM To: Subject: [maker-devel] Using multiple protein profiles as queries for prediction in intergenic regions? Greetings! I am exploring the use of MAKER-P. But I need your advice in determining if MAKER-P is the best choice for me. In the recent past, I've tried using the AUGUSTUS --profile option which allows for user defined protein profiles to be used as query. I am interested in predicted gene-like structures in intergenic regions (I've masked away genic regions as predicted by genome annotation pipeline) - in some orphan legume plant species - so not much in the way of extrinsic / external data in the way of EST, NGS data - let alone extrinsic data that might map to so called intergenic regions i.e. whatever little data there exists, has been already used to predict 'genes'. When I tried using --profile option of AUGUSTUS, I was not satisfied with the frequency and magnitude of fusion genes. Additionally, there was no easy way for me to consolidate gene-like structures that varied, but overlapped when using different protein profiles as queries (one profile per Pfam HMM within a 4 member clan). Additionally, training all the orphan legume species is not an exciting undertaking... because of time and computing resource requirements. All this led me to consider MAKER-P as an option. Based on what I've described above, do you think I should proceed with trying to use MAKER-P for my purposes? Thank you, in advance. Sincerely, Anand -- Anand K.S. Rao PhD candidate, Plant Biology with a Designated Emphasis in Biotechnology , UC- Davis , CA - 95616 USA | aksrao at ucdavis.edu | (530) 574-5134 | LinkedIn _________________________________________________________________________ CTTATTGTTGAACTTOAATGGTGCTAATGATCCTCGTOTCTCCTGAACGT - translate THAT! _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Mon Sep 29 08:47:00 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Mon, 29 Sep 2014 14:47:00 +0000 Subject: [maker-devel] maker failure with example data In-Reply-To: References: Message-ID: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Mon Sep 29 06:33:50 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Mon, 29 Sep 2014 18:03:50 +0530 Subject: [maker-devel] maker failure with example data Message-ID: Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 13:33:18 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 19:33:18 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: From eschang1 at gmail.com Tue Sep 30 14:02:30 2014 From: eschang1 at gmail.com (Sally Chang) Date: Tue, 30 Sep 2014 15:02:30 -0500 Subject: [maker-devel] interpreting SNAP gene-stats output Message-ID: Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Tue Sep 30 14:49:10 2014 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Tue, 30 Sep 2014 20:49:10 +0000 Subject: [maker-devel] Maker In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> References: <000001cfdc80$77dc88e0$67959aa0$@uni-bayreuth.de> <7A60AB257EFF2B48B1F4C814817EA0537B66060F@mxb1.hg.genetics.utah.edu> Message-ID: MAKER can't annotate assembled transcripts. It can only annotate genomic sequence. Transcript annotation is a very different problem. Using a different species' genome would not produce annotation for your transcripts, rather your transcripts would just be considered evidence for annotating that species genome. Your best option is probably just to use BLAST to look for homology between species. Do BLAST both ways and if gene A in species 1 is the best hit for gene B in species 2 and vice versa (reciprocal best hits), then you can consider them as being paralogous. Also use the proteome from the related species when doing the BLAST analysis (not the nucleotide transcripts). --Carson On 9/30/14, 6:51 AM, "Mark Yandell" wrote: > > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Co-director USTAR Center for Genetic Discovery >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: Alfons Weig [a.weig at uni-bayreuth.de] >Sent: Tuesday, September 30, 2014 1:30 AM >To: Mark Yandell >Subject: Maker > >Hello, > >I have just sent a feedback via the Maker feedback form but received the >following error message: Therefore, I send it vir regular mail: > >Error executing run mode 'feedback': Can't call method "MailMsg" without >a package or object reference at /var/www/cgi-bin/mwas/lib/MWS.pm line >1116. >at /var/www/cgi-bin/mwas/maker.cgi line 21. > >I have just tested the Maker annotation pipeline with short sequences >from an RNAseq de-novo assembly using A. mellifera as areference genome. >Unfortunately, honey bee is not the species we sequence but is closely >related to it. >I was wondering whether this was a good approach? There are no genome >data availabe for our bee species. Is maker able to annotate de.novo >assemble mRNA transcripts obtained by Velvet/Oases (including partial >sequences)? > >Best regards >Alfons Weig > > >Dr. Alfons Weig >DNA-Analytik & ?koinformatik - Univ. Bayreuth - NW1 >Universit?tsstrasse 30 >95447 Bayreuth - Germany >Tel. +49 (0)921-552457 >www.daneco.uni-bayreuth.de > From carsonhh at gmail.com Tue Sep 30 14:59:47 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 30 Sep 2014 14:59:47 -0600 Subject: [maker-devel] interpreting SNAP gene-stats output In-Reply-To: References: Message-ID: Probably. But it's really not that important of a value because during the 'fathom -genome.ann genome.dna -categorize 1000' step outlined in the SNAP training literature, fathom turns each gene into it's own little contig padded by 1000bp on either size. So in the end the number of starting contigs becomes irrelevant, because they all get trimmed and thrown away anyways. --Carson From: Sally Chang Date: Tuesday, September 30, 2014 at 2:02 PM To: Subject: [maker-devel] interpreting SNAP gene-stats output Hi all, I was wondering if someone could help me make sure I am looking at these results from running fathom -gene-stats on an annotation: 1049 sequences 0.245825 avg GC fraction (min=0.162446 max=0.431287) 5533 genes (plus=2760 minus=2773) 91 (0.016447) single-exon 5442 (0.983553) multi-exon 101.857010 mean exon (min=1 max=6534) 81.880493 mean intron (min=4 max=5486) Are the 1049 sequences the actual number of contigs/sequences from your assembly that MAKER ended up using? And is that 5533 genes the number of genes it found on those contigs (and strand info?). Thanks very much, Sally Chang _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at imbim.uu.se Tue Sep 30 23:39:21 2014 From: marc.hoeppner at imbim.uu.se (=?Windows-1252?Q?Marc_H=F6ppner?=) Date: Wed, 1 Oct 2014 05:39:21 +0000 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Another possibility could be that MPICH2 wasn?t build properly, no? I remember something with enabling shared libraries during the compilation of mpich, without which the error below would appear. /Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at imbim.uu.se On 30 Sep 2014, at 21:33, Carson Holt > wrote: The message is warning that there are multiple instances of MAKER running, but no MPI communication. When you build MAKER (perl Build.PL step when installing MAKER), you need to specify the location of 'mpicc' and 'mpi.h' to build with MPI support. Otherwise you won't be able to link against MPICH2 shared libraries. You probably need to rerun that step. --Carson From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data Hi Carson, I figured out the problem is with RepeatMasker installation and I fixed it. I am running maker with MPICH2 and I get the following warning when I start it: STATUS: Processing and indexing input FASTA files... WARNING: Multiple MAKER processes have been started in the same directory. I would like to if this is common. Regards, Goutham On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: Dear Carson, Thank you for the reply. I reinstalled the BioPerl and now I am getting the following error on test data. ERROR: RepeatMasker failed --> rank=NA, hostname=motif ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig-dpp-500-500 On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt > wrote: The error is caused by the BioPerl indexer returning an empty length for the indexed fasta sequence (possibly because of a corrupt index file or other reasons). You may need to reinstall BioPerl (use the CPAN version not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb directory for the MAKER run before retrying. Also verify that the /tmp directory on your system or the directory pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is not set to an NFS mounted location. Thanks, Carson From: Goutham atla > Date: Monday, September 29, 2014 at 6:33 AM To: > Subject: maker failure with example data Dear All, I am running maker with the demo file, i.e dip_contig.fasta by keeping all other parameters in .ctl files as default. But it do not progress and shows the following message that the length of the sequence is 0. Can anybody help me ? --Next Contig-- MAKER WARNING: All old files will be erased before continuing #--------------------------------------------------------------------- Skipping the contig because it is too short!! SeqID: contig-dpp-500-500 Length: 0 #--------------------------------------------------------------------- Regards, Goutham -- Goutham Atla -- Goutham Atla _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From goutham.atla at gmail.com Tue Sep 30 23:58:16 2014 From: goutham.atla at gmail.com (Goutham atla) Date: Wed, 1 Oct 2014 11:28:16 +0530 Subject: [maker-devel] URGENT: Re: maker failure with example data In-Reply-To: References: Message-ID: Dear All, Thank you. I figured out th problem is with mpich2. I was behind mpich2 but was unsuccessful. I installed mpich v3 and its working fine now. Thank you all. The old GMDO tutorials are bit misleading as the new versions have come up. On Wed, Oct 1, 2014 at 11:09 AM, Marc H?ppner wrote: > Another possibility could be that MPICH2 wasn?t build properly, no? I > remember something with enabling shared libraries during the compilation of > mpich, without which the error below would appear. > > /Marc > > Marc P. Hoeppner, PhD > Team Leader > BILS Genome Annotation Platform > Department for Medical Biochemistry and Microbiology > Uppsala University, Sweden > marc.hoeppner at imbim.uu.se > > > > On 30 Sep 2014, at 21:33, Carson Holt > wrote: > > The message is warning that there are multiple instances of MAKER > running, but no MPI communication. When you build MAKER (perl Build.PL step > when installing MAKER), you need to specify the location of 'mpicc' and > 'mpi.h' to build with MPI support. Otherwise you won't be able to link > against MPICH2 shared libraries. You probably need to rerun that step. > > --Carson > > > From: Goutham atla > Date: Tuesday, September 30, 2014 at 10:49 AM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: URGENT: Re: maker failure with example data > > Hi Carson, > > I figured out the problem is with RepeatMasker installation and I fixed > it. > > I am running maker with MPICH2 and I get the following warning when I > start it: > > > > *STATUS: Processing and indexing input FASTA files... WARNING: Multiple > MAKER processes have been started in the same directory.* > > I would like to if this is common. > > Regards, > Goutham > > > On Tue, Sep 30, 2014 at 12:02 PM, Goutham atla > wrote: > >> Dear Carson, >> >> Thank you for the reply. I reinstalled the BioPerl and now I am getting >> the following error on test data. >> >> ERROR: RepeatMasker failed >> --> rank=NA, hostname=motif >> ERROR: Failed while doing repeat masking >> ERROR: Chunk failed at level:0, tier_type:1 >> FAILED CONTIG:contig-dpp-500-500 >> >> On Mon, Sep 29, 2014 at 8:17 PM, Carson Holt < >> carson.holt at genetics.utah.edu> wrote: >> >>> The error is caused by the BioPerl indexer returning an empty length >>> for the indexed fasta sequence (possibly because of a corrupt index file or >>> other reasons). You may need to reinstall BioPerl (use the CPAN version >>> not the BioPerl-live version), or reinstall Berkley DB (used by the BioPerl >>> indexer), or reinstall the Perl module DB_File via CPAN (Perl's interface >>> to Berkley DB). After reinstalling BioPerl, delete the mpi_blastdb >>> directory for the MAKER run before retrying. >>> >>> Also verify that the /tmp directory on your system or the directory >>> pointed to by TMP= in the maker_opts,ctl file is not full and that TMP= is >>> not set to an NFS mounted location. >>> >>> Thanks, >>> Carson >>> >>> >>> >>> >>> From: Goutham atla >>> Date: Monday, September 29, 2014 at 6:33 AM >>> To: >>> Subject: maker failure with example data >>> >>> Dear All, >>> >>> I am running maker with the demo file, i.e dip_contig.fasta by keeping >>> all other parameters in .ctl files as default. But it do not progress and >>> shows the following message that the length of the sequence is 0. Can >>> anybody help me ? >>> >>> >>> >>> --Next Contig-- >>> >>> MAKER WARNING: All old files will be erased before continuing >>> #--------------------------------------------------------------------- >>> Skipping the contig because it is too short!! >>> SeqID: contig-dpp-500-500 >>> Length: 0 >>> #--------------------------------------------------------------------- >>> >>> >>> Regards, >>> Goutham >>> >> >> >> >> -- >> Goutham Atla >> > > > > -- > Goutham Atla > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -- Goutham Atla -------------- next part -------------- An HTML attachment was scrubbed... URL: