From scott at scottcain.net Tue Mar 1 09:37:34 2016 From: scott at scottcain.net (Scott Cain) Date: Tue, 1 Mar 2016 10:37:34 -0500 Subject: [maker-devel] GMOD in Google Summer of Code 2016 Message-ID: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Mar 1 10:19:28 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Mar 2016 16:19:28 +0000 Subject: [maker-devel] [apollo] GMOD in Google Summer of Code 2016 In-Reply-To: References: Message-ID: Woohoo! Congratulations, that?s awesome news! chris On Mar 1, 2016, at 9:37 AM, Scott Cain > wrote: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/ If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to sympa at lists.lbl.gov | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Wed Mar 2 10:32:04 2016 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Mar 2016 11:32:04 -0500 Subject: [maker-devel] Call for Abstracts for BOSC Message-ID: Hi All, I'm forwarding this call for abstracts for BOSC (Bioinformatics Open Source Conference) this year in Orlando, Florida: >From Peter Cock (p.j.a.cock at googlemail.com): As BOSC co-chair I would like to encourage you all to think about attending BOSC 2016, and if you are working on your own open source software for bioinformatics please consider submitting an abstract. See the email below and: http://news.open-bio.org/2016/03/01/bosc-2016-call-for-abstracts/ Also, as a member of the Open Bioinformatics Foundation (OBF) Board of Directors, I am delighted to let you know about the new OBF Travel Fellowship which could be used to attend BOSC: http://news.open-bio.org/2016/03/01/obf-travel-fellowship-program/ In case you missed the earlier announcement last year, we finally got rid of the paper forms for OBF membership, see: http://news.open-bio.org/2015/12/10/online-membership-form/ Thank you, Peter [Biopython developer, BOSC co-chair, OBF Secretary, etc.] -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From chankl at mpob.gov.my Tue Mar 1 01:45:46 2016 From: chankl at mpob.gov.my (Chan Kuang Lim) Date: Tue, 1 Mar 2016 15:45:46 +0800 (MYT) Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <1064605078.11733402.1456818000393.JavaMail.root@mpob.gov.my> Message-ID: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL Come and join us on: Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 2 11:13:30 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 2 Mar 2016 17:13:30 +0000 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Message-ID: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/. If you didn?t give a Fgenesh file, then you won?t get any predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL ________________________________ Come and join us on: [http://webmail.mpob.gov.my:8080/image-footer/pipoc17.jpg] 1. Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 2. 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) 3. Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) [http://webmail.mpob.gov.my:8080/image-footer/facebook-logo.jpg] Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 2 11:36:04 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 2 Mar 2016 10:36:04 -0700 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Message-ID: <333D3A3A-49BC-42ED-87F7-053AA46CC1F3@gmail.com> Also there is the chance that FgenesH has changed formats slightly for their output (it's happened a couple of times before), so if you are already running with a parameter file you purchased that could be the issues. Look at the STDERR report MAKER produces to see if FgenesH even ran and with what command. ?Carson > On Mar 2, 2016, at 10:13 AM, Daniel Ence wrote: > > Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/ . If you didn?t give a Fgenesh file, then you won?t get any predictions. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: >> >> Dear MAKER developers, >> >> I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? >> >> Thank you. >> >> Regards, >> Chan KL >> >> Come and join us on: >> >> >> >> Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my >> 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) >> Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) >> >> Malaysian Palm Oil Board - http://www.mpob.gov.my >> This email was sent using MPOB Webmail System. >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Thu Mar 3 05:01:06 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 3 Mar 2016 12:01:06 +0100 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> Message-ID: <56D81972.7000002@students.uni-mainz.de> Hello Carson, May I ask on what kind of hardware setup you guys are running MAKER? I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. On 29.02.2016 20:09, Carson Holt wrote: > You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). > > ?Carson > > >> On Feb 26, 2016, at 7:16 AM, Florian wrote: >> >> Hi all, >> >> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >> >> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >> >> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >> >> >> thanks, >> Florian >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-03-03 11:35:41.png Type: image/png Size: 149996 bytes Desc: not available URL: From jacqueline.atkins at nih.gov Thu Mar 3 12:54:19 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 18:54:19 +0000 Subject: [maker-devel] Maker Installation Questions Message-ID: Good Afternoon, I am a Systems Engineer who is attempting to install and configure maker for a user. From what I can tell, database support is optional and maker can be used without a backend database. Please confirm that this is the case. Also, could you provide any examples of how I might be able to test the functionality of the maker installation? Thank you in advance. Jackie Atkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Thu Mar 3 15:37:30 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 21:37:30 +0000 Subject: [maker-devel] Maker Install Issue Message-ID: Good Afternoon, I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 When I attempt to execute mpi_iprscan, I get the following error: Can't locate Parallel/MPIcar.pm If you could advise how I might be able to resolve this issue, it would be greatly appreciated. Thank you. Jacqueline Atkins, Contractor Sr. HPC Engineer National Institute of Allergy and Infectious Diseases SRA International Inc., A CSRA Company office 301-451-9644, mobile 301-767- 7110 5601 Fishers Lane, 6A60, Bethesda, MD 20852 Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 15:54:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 14:54:54 -0700 Subject: [maker-devel] Maker Install Issue In-Reply-To: References: Message-ID: Hi Jacqueline, mpi_iprscan and mpi_evaluator are accessory scripts made for a very specific system and purpose (development related). They are not a core part of the MAKER pipeline, are undocumented, and should be ignored. The script you use to run MAKER is ?/maker/bin/maker It is MPI enabled, and you can call it directly or via mpiexec. Thanks, Carson > On Mar 3, 2016, at 2:37 PM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Good Afternoon, > > I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 > > When I attempt to execute mpi_iprscan, I get the following error: > Can't locate Parallel/MPIcar.pm > > If you could advise how I might be able to resolve this issue, it would be greatly appreciated. > > Thank you. > > Jacqueline Atkins, Contractor > Sr. HPC Engineer > National Institute of Allergy and Infectious Diseases > SRA International Inc., A CSRA Company > office 301-451-9644, mobile 301-767- 7110 > 5601 Fishers Lane, 6A60, Bethesda, MD 20852 > Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 23:42:07 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 22:42:07 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D81972.7000002@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> <56D81972.7000002@students.uni-mainz.de> Message-ID: We run on a standard cluster. We have traditional NFS as well as more advanced Lustre options for shared storage. Each node has both locally mounted disk and in memory storage available (I never use the in memory storage though because MAKER requires a lot of temporary storage). I run using OpenMPI (it scales better than MPICH2 - also MAKER is incompatible with MVAPICH2 because of a known registered memory defect in that MPI flavor). We use the SLURM scheduler although previously we had PBS. I usually run job sizes of between 100 and 200 CPU cores (10 to 20 nodes). We have mixed node types of 12, 16, 20. and 24 core nodes. I always set TMP= to a locally mounted disk (never NFS or RAM disk). The working directory is always NFS or Lustre. I've also run under a similar configuration on the TACC and XSEDE clusters (https://www.xsede.org ). They use SLURM and previously SGE for their scheduler. I?ve been able to run on 600 plus CPU cores per job there, but I get better efficiency with multiple jobs at ~200 CPU cores (communication overhead gets too high for a single root process to handle effectively above 200 cores). MAKER will need ~2 Gb of RAM for every core you give it with MPI. ?Carson > On Mar 3, 2016, at 4:01 AM, Florian wrote: > > Hello Carson, > > May I ask on what kind of hardware setup you guys are running MAKER? > > I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? > > I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. > > On 29.02.2016 20:09, Carson Holt wrote: >> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). >> >> ?Carson >> >> >>> On Feb 26, 2016, at 7:16 AM, Florian wrote: >>> >>> Hi all, >>> >>> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >>> >>> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >>> >>> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >>> >>> >>> thanks, >>> Florian >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Mar 5 20:10:24 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 5 Mar 2016 21:10:24 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed Message-ID: Hi All, I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. ERROR: RepeatMasker failed --> rank=4, hostname=hostname ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold149 The RepeatMasker was corrected installed. Should I update Maker to V3.0? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Sun Mar 6 10:48:36 2016 From: mcsimenc at gmail.com (Matt Simenc) Date: Sun, 6 Mar 2016 08:48:36 -0800 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help Message-ID: I am working on creating a custom repeat library. I want to use the ProtExcluder.pl script, found on the maker wiki at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic to trim out possible gene sequences from the default RepeatModeler output when run on my genome. I'm getting some errors and output in which no sequences are removed from my RepeatModeler library and am wondering if you anyone has experience with this script and can help me understand the errors. I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx output (default output,blast 2.2.31+) like: ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr - I get an output file repeat_fastanoProtFinal that contains exactly the same sequences as the input repeat_fasta. - stderr has these errors: Can't exec "binaries/esl-sfetch": No such file or directory at /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. Can not open the seqfile /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. ProtExcluder.pl created a bunch of files in the directory where it is trying to unsuccessfully access the fnolow50seq file, which does not exist, though there are files whose names have the suffix fnolow50seqm, fnolow50seqmGC, and fnolow50seqmns. Any help would be appreciated! I could write a script to do this but would rather use an already debugged one to save time. Thanks! Matt Simenc Der Evolutionary Genomics Lab California State University, Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Mar 6 14:13:24 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 6 Mar 2016 13:13:24 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: Message-ID: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Hi Wenbo, The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). Thanks, Carson > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > Hi All, > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > ERROR: RepeatMasker failed > --> rank=4, hostname=hostname > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold149 > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Sun Mar 6 16:04:14 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 06 Mar 2016 22:04:14 +0000 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help In-Reply-To: References: Message-ID: Did you install hmmer3 ? need that to get esl-sfetch not sure how you configured the paths when you run this. Jason On Sun, Mar 6, 2016 at 8:48 AM Matt Simenc wrote: > I am working on creating a custom repeat library. I want to use the > ProtExcluder.pl script, found on the maker wiki at > > > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic > > to trim out possible gene sequences from the default RepeatModeler output > when run on my genome. I'm getting some errors and output in which no > sequences are removed from my RepeatModeler library and am wondering if you > anyone has experience with this script and can help me understand the > errors. > > I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx > output (default output,blast 2.2.31+) like: > > ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr > > - I get an output file repeat_fastanoProtFinal that contains exactly the > same sequences as the input repeat_fasta. > > - stderr has these errors: > > Can't exec "binaries/esl-sfetch": No such file or directory at > /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. > > Can not open the seqfile > /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq > > mergeunmatchedregion.pl seqfile > > Illegal division by zero at > /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. > > ProtExcluder.pl created a bunch of files in the directory where it is > trying to unsuccessfully access the fnolow50seq file, which does not exist, > though there are files whose names have the suffix fnolow50seqm, > fnolow50seqmGC, and fnolow50seqmns. > > Any help would be appreciated! I could write a script to do this but would > rather use an already debugged one to save time. Thanks! > > Matt Simenc > Der Evolutionary Genomics Lab > California State University, Fullerton > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Mon Mar 7 14:26:19 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Mon, 7 Mar 2016 15:26:19 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: Hi Carson, Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. ============================= Congratulations! RepeatMasker is now ready to use. The program is installed with a full version of the repeat library: DFAM Library Version = Dfam_2.0 RMLibrary Version = 20150807 Repbase Version = 20150807 ============================= I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. Thank you! Best, Wenbo 2016-03-06 15:13 GMT-05:00 Carson Holt : > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker > is not installed and configured correctly. You will have to fix whatever > is wrong with your installation, and then make sure you can get > RepeatMasker to run correctly by itself before running it inside of MAKER > (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to > v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file > /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 15:01:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:01:38 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" It?s not from MAKER. RepeatMasker is printing that error and then failing. ?Carson > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt >: > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? > wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 15:54:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:54:10 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Message-ID: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. ?Carson > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > >> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Tue Mar 8 14:22:14 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 8 Mar 2016 15:22:14 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: Hi Carson, Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. Thanks! Best, Wenbo 2016-03-07 16:54 GMT-05:00 Carson Holt : > RepeatMasker doesn?t actually finish installing until after you run it at > least once with the RepBase Libraries (i.e. first job with RepBase). During > it?s very first run it builds a bunch of needed library files under > ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The > failure message you get is that it can?t build those files (which is a > RepeatMasker error not a MAKER error). So RepeatMasker is either installed > or configured incorrectly. > > ?Carson > > > > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can > also look at MAKER?s STDERR to see exactly what command MAKER was using to > run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following > the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am > still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt : > >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >> is not installed and configured correctly. You will have to fix whatever >> is wrong with your installation, and then make sure you can get >> RepeatMasker to run correctly by itself before running it inside of MAKER >> (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >> v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 8 14:25:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Mar 2016 13:25:37 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> The issue is unrelated to MAKER. Likely something happened during your initial configuration that resulted in a partial file. Perhaps when you unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not the issue. ?Carson > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt >: > RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. > > ?Carson > > > >> On Mar 7, 2016, at 2:01 PM, Carson Holt > wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >>> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >>> >>> Hi Carson, >>> >>> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >>> >>> ============================= >>> Congratulations! RepeatMasker is now ready to use. >>> The program is installed with a full version of the repeat library: >>> DFAM Library Version = Dfam_2.0 >>> RMLibrary Version = 20150807 >>> Repbase Version = 20150807 >>> ============================= >>> >>> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >>> >>> Thank you! >>> >>> Best, >>> Wenbo >>> >>> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue Mar 8 14:39:59 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 08 Mar 2016 20:39:59 +0000 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> Message-ID: I think that may be about permissions of creating the all file in your RepeatMasker library folder - you may look at the write permissions there and see. On Tue, Mar 8, 2016 at 12:25 PM Carson Holt wrote: > The issue is unrelated to MAKER. Likely something happened during your > initial configuration that resulted in a partial file. Perhaps when you > unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not > the issue. > > ?Carson > > > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" > outside of MAKER. It was successfully finished. Then I run Maker, and there > is no error. I am curious why RepeatMasker could not build these library > files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt : > >> RepeatMasker doesn?t actually finish installing until after you run it at >> least once with the RepBase Libraries (i.e. first job with RepBase). During >> it?s very first run it builds a bunch of needed library files under >> ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The >> failure message you get is that it can?t build those files (which is a >> RepeatMasker error not a MAKER error). So RepeatMasker is either installed >> or configured incorrectly. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can >> also look at MAKER?s STDERR to see exactly what command MAKER was using to >> run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 1:26 PM, ??? wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following >> the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am >> still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt : >> >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >>> is not installed and configured correctly. You will have to fix whatever >>> is wrong with your installation, and then make sure you can get >>> RepeatMasker to run correctly by itself before running it inside of MAKER >>> (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >>> v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking >>> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >>> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to >>> V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meesters at uni-mainz.de Thu Mar 10 08:53:43 2016 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 10 Mar 2016 15:53:43 +0100 Subject: [maker-devel] maker low cpu utilization Message-ID: <56E18A77.5030509@uni-mainz.de> Dear maker-developers, As a computational scientist of our local HPC-Team, I recently installed maker and its tools. We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? Thanks a lot in advance, Best regards, Christian Meesters -- **************************************** Dr. Christian Meesters Johannes Gutenberg-Universit?t Mainz Zentrum f?r Datenverarbeitung Anselm-Franz-von-Bentzelweg 12 55099 Mainz tel. +49 (0)6131 39 26397 **************************************** From dence at genetics.utah.edu Thu Mar 10 12:22:54 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 10 Mar 2016 18:22:54 +0000 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6683A317-2DB7-4CE0-86A1-A8C7CB0931CC@genetics.utah.edu> Hi Christian, I think what you have described is normal behavior for MAKER. It spawns many child processes, most of which complete very quickly. What dataset were you running with MAKER? Did it complete successfully? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 12:34:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 11:34:59 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. ?Carson -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcessTable_simple.pm_alt Type: application/octet-stream Size: 2864 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Signal.pm_alt Type: application/octet-stream Size: 6703 bytes Desc: not available URL: -------------- next part -------------- > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 14:56:57 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 13:56:57 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> References: <56E18A77.5030509@uni-mainz.de> <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> Message-ID: <9E6397A7-FD1F-44ED-9230-479B89DC1092@gmail.com> Also the ?maker? processes should rarely use very much CPU. All they do is shepherd data between processes like augustus, snap, blast, and exonerate (there are some short intermediate processing steps, but the external tools are the work horses). So each ?maker? process is usually just waiting for external tools to complete. What maker does is divide the input data into reasonable chunks, so that there will always be a blast, snap, or augustus process running somewhere to keep all CPUs busy. If the structure of the actual input data is odd compared to the typical genome project input then there could hypothetically be a situation where not enough reasonable task chunks can be made to keep all CPUs busy. I?d really have to see your data if you think that is the issue. MAKER has the following points of parallelization. 1. Every contig goes to a separate thread. 2. Large contigs are split into overlapping pieces that go into separate threads (determined using the max_dna_len= paramter with the default being 100,000 bp) 3. BLAST databases for input evidence are split into 10 pieces (so BLAST analysis are split by 10) 4. Ab inito gene prediction on large contigs are split into overlapping sections of 10 megabases each. So unless you have a small dataset that can?t be split by any of the above parameters it should be able to parallelize. Also if your assembly contains primarily short contigs and you set min_contig such that the root process spends most of it?s time skipping contigs and less time distributing them for other processes to analyze, then that could create an apparent slowdown. I have had that happen on a couple of assemblies that had > 2 million contigs, but only ~10,000 were usable. By filtering small contigs out of the assembly, you can get around that last issue. ?Carson > On Mar 10, 2016, at 11:34 AM, Carson Holt wrote: > > The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. > > If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. > > ?Carson > > > > >> On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: >> >> Dear maker-developers, >> >> As a computational scientist of our local HPC-Team, I recently installed maker and its tools. >> >> We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). >> >> We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. >> >> I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. >> >> Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. >> >> Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? >> >> Thanks a lot in advance, >> Best regards, >> Christian Meesters >> >> -- >> **************************************** >> >> Dr. Christian Meesters >> Johannes Gutenberg-Universit?t Mainz >> Zentrum f?r Datenverarbeitung >> Anselm-Franz-von-Bentzelweg 12 >> 55099 Mainz >> >> tel. +49 (0)6131 39 26397 >> >> **************************************** >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From chenwenbo1020 at gmail.com Sun Mar 13 21:22:53 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sun, 13 Mar 2016 22:22:53 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction Message-ID: Hi All, I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 14 11:17:31 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 14 Mar 2016 16:17:31 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From chenwenbo1020 at gmail.com Tue Mar 15 15:07:28 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 15 Mar 2016 16:07:28 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence : > Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, > MAKER2, and MAKER-P publications. The difficulty when working with > relatively unstudied organisms is that might not be gold-criteria for any > given genome. > > I think that the process you describe (using RNA-seq data, protein > sequences, proteome sequence of related insects, and swiss-prot) would > result in gene models that are probably ready for manual curation and not > just as training for another ab-initio predictor (SNAP). > > To answer your specific questions: > > 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity > and specificity. This si described in more detail in this review that Mark > and I wrote several years ago: > http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html > Augustus provides measures of accuracy, sensitivity, and specificity > during it?s training procedures, although I can?t recall exactly where it > provides those. I believe that Genemark provides similar reports during > it?s own training process. I?m not certain about SNAP. In order to evaluate > your final SNAP training files, you might try running SNAP with MAKER > without any evidence and compare the distributions of AED (annotation edit > distance) values with the distribution of AED values from your prior MAKER > runs. I?d be surprised if two rounds of training improved the AED scores > much though. > > 2) If you have EST evidence that complements the RNAseq data that you > already used, then feel free to include it. MAKER treats loci that are > partially supported by EST sequences the same as it does all other loci. > MAKER evaluates the alignment evidences and chooses the ab-initio > prediction that is best supported by the alignment evidence. Partial models > result from loci where no complete ab-initio prediction was produced by any > of the predictors that you used. > > 3) see above. > > Let me know if that helps, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > > > Hi All, > > > > I am using MAKER to annotate a insect genome. Firstly, I trained > Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, > I gave them to Maker. The evidences included assembled RNA-seq data, > protein sequences of my insect, proteome sequences of three related insects > and Swiss-Prot. At last, I used the gene models generated by Maker with AED > < 0.01 to train SNAP for two rounds. So my questions are: > > > > 1. how to evaluate the results of ab initio training. How can I know > these gene finders were well trained? > > > > 2. Should I add EST evidences? How does Maker work on the locus where > there is only partial EST evidence? Will the partial EST sequences cause > gene models to be partial? > > > > 3. Is there some gold-criteria to evaluate the results of gene > prediction? How to improve it? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 15 15:19:32 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 15 Mar 2016 20:19:32 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 15, 2016, at 2:07 PM, ??? > wrote: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence >: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? > wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 15 17:16:22 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Mar 2016 16:16:22 -0600 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> References: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Message-ID: In general if you want to know if the ab inito algorithms are trained well, look at them in something like apollo. If SNAP and Augustus look like each other, and both look like the final hint based models then they are trained well. With AED it's more of a correlative rather than an absolute measurement. The lower the value, in general the better the model. If you have gold standard models you can get sensitivity and specificity metrics from programs like EVAL from WashU. But that?s not really an option for newly sequenced organisms. ?Carson > On Mar 15, 2016, at 2:19 PM, Daniel Ence wrote: > > Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 15, 2016, at 2:07 PM, ??? > wrote: >> >> Hi Daniel, >> >> Thanks for your help. >> >> "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" >> >> ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. >> >> When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? >> >> here is my parameter in the CTL file: >> >> alt_splice=0 >> always_complete=1 >> split_hit=257022 >> max_dna_len=1700000 >> >> Thanks a lot! >> >> Best, >> Wenbo >> >> >> 2016-03-14 12:17 GMT-04:00 Daniel Ence >: >> Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. >> >> I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). >> >> To answer your specific questions: >> >> 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html >> Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. >> >> 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. >> >> 3) see above. >> >> Let me know if that helps, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> > On Mar 13, 2016, at 8:22 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: >> > >> > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? >> > >> > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? >> > >> > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdubarry at genoscope.cns.fr Wed Mar 16 10:09:28 2016 From: mdubarry at genoscope.cns.fr (Marion Dubarry) Date: Wed, 16 Mar 2016 16:09:28 +0100 Subject: [maker-devel] understanding maker output Message-ID: <56E97728.6010103@genoscope.cns.fr> Dear Maker, I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : chr3.gff chr3.maker.non_overlapping_ab_initio.transcripts.fasta chr3.maker.snap_masked.transcripts.fasta theVoid.chr3/ chr3.maker.non_overlapping_ab_initio.proteins.fasta chr3.maker.snap_masked.proteins.fasta run.log 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? Thanks in advance, Marion From carsonhh at gmail.com Wed Mar 16 14:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Mar 2016 13:42:59 -0600 Subject: [maker-devel] understanding maker output In-Reply-To: <56E97728.6010103@genoscope.cns.fr> References: <56E97728.6010103@genoscope.cns.fr> Message-ID: <8F95F7E3-A955-484C-B046-0E0BC188DC49@gmail.com> Hi Marion, None of your evidence supported any of the SNAP models, so you got no results. You did have reference SNAP models in both fasta and GFF3 format (matych/match_part features), but those are just for reference. You probably have issues with either your mrna.gff or prot.gff files. You may want to familiarize yourself with how MAKER works and expected output using an online tutorial like the following ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Mar 16, 2016, at 9:09 AM, Marion Dubarry wrote: > > Dear Maker, > > I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . > > 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! > > In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : > chr3.gff > chr3.maker.non_overlapping_ab_initio.transcripts.fasta > chr3.maker.snap_masked.transcripts.fasta > theVoid.chr3/ > chr3.maker.non_overlapping_ab_initio.proteins.fasta > chr3.maker.snap_masked.proteins.fasta > run.log > > 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? > In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? > > > Thanks in advance, > Marion > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 22 06:38:57 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 22 Mar 2016 17:08:57 +0530 Subject: [maker-devel] Document 2 Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: Document 2.zip Type: application/zip Size: 3095 bytes Desc: not available URL: From mmacd at udel.edu Tue Mar 22 11:33:42 2016 From: mmacd at udel.edu (Madolyn Macdonald) Date: Tue, 22 Mar 2016 12:33:42 -0400 Subject: [maker-devel] Question about Maker output Message-ID: Hello, My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: maker-Contig206-snap-gene-3.11-mRNA-1 maker-Contig206-snap-gene-3.12-mRNA-1 I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? Thanks in advance! -- Madolyn Stinner (formerly Madolyn MacDonald) UDel Bioinformatics and Systems Biology, PhD student RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 22 15:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Mar 2016 14:31:08 -0600 Subject: [maker-devel] maker-devel post from mmacd@udel.edu requires approval In-Reply-To: References: Message-ID: Hi Madolyn, They are different genes because their ID?s are different. The numbers are meaningless, they are just iterators to make sure the ID?s are unique. Thanks, Carson > From: Madolyn Macdonald > Subject: Question about Maker output > Date: March 22, 2016 at 10:33:42 AM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. > > After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: > > maker-Contig206-snap-gene-3.11-mRNA-1 > > maker-Contig206-snap-gene-3.12-mRNA-1 > > I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? > > Thanks in advance! > > > -- > Madolyn Stinner (formerly Madolyn MacDonald) > UDel Bioinformatics and Systems Biology, PhD student > RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 24 15:56:11 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 24 Mar 2016 20:56:11 +0000 Subject: [maker-devel] question about Maker2 In-Reply-To: References: <56F4066F.4000803@fgcz.ethz.ch> Message-ID: Hi Giancarlo, Anything listed as something like maker-*-augustus was a result of MAKER sending hints to augustus, and anything like augustus-*-abinit was the result of augustus run directly from the HMM without hints. Here is more detail on the format ?> - - -gene- - Top level possibilities: maker #maker generated model snap_masked #snap run on masked sequence augustus_masked #augustus run on masked sequence etc. Internal source: abinit #ab initio model direct from HMM snap #hints provided to SNAP (alters scoring) augustus #hints provided to augustus (alters scoring) Then chunk and iterator are just to generate a uniq ID. Example: augustus_masked-scaffold11899-abinit-gene-0.6 #Produced by Augustus on masked sequence using raw HMM (no MAKER intervention). maker-scaffold11899-augustus-gene-0.6 #Produced by maker sending hints to augustus to modify scoring against the HMM ?Carson > On 3/24/16, 9:23 AM, "giancarlo.russo" > wrote: > >> Dear Mike, >> >> first of all thanks for taking care and sharing Maker, as part of the >> community I appreciate it. >> >> I have a question about the nomenclature of the annotation in the output >> file: >> what is the difference between genes named >> >> maker-Contig-XXX >> and those named >> augustus-Contig-XXX-processed genes >> ? >> >> Please find attached the maker_opts file I have used for my annotation. >> I was under the impression that the ab-initio related prefixes would be >> present only in the genes which are not marked as "maker" in column 3 of >> the gff file (i.e., those >> with both ab-initio and EST evidence) >> >> Is there something I am missing? >> >> Thanks a lot in advance, >> Giancarlo >> >> -- >> Giancarlo Russo, Ph.D. >> Functional Genomics Center Zurich >> Y32 H66 >> Winterthurerstr. 190 >> 8057 Zurich >> SWITZERLAND >> Phone: +41 44 635 39 64 >> Fax: +41 44 635 39 22 >> E-Mail: giancarlo.russo at fgcz.ethz.ch >> > > From carsonhh at gmail.com Mon Mar 28 10:10:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Mar 2016 09:10:06 -0600 Subject: [maker-devel] Maker Execution Error In-Reply-To: References: Message-ID: <007B0008-6BFD-4121-9F0D-56EA9B3A2B5A@gmail.com> Hi Jackie, From the INSTALL file included with MAKER ?> Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Also the following ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). The first one is the most likely. Thanks, Carson > On Mar 28, 2016, at 8:38 AM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Hello, > > I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error > > $ mpiexec -n 4 maker -help > > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: submit (PID 316) > MPI_COMM_WORLD rank: 2 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork > [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > [submit bin]$ mpiexec --version > mpiexec (OpenRTE) 1.8.4 > > > I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? > > Thank you in advance. > Jackie > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Mon Mar 28 09:38:39 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Mon, 28 Mar 2016 14:38:39 +0000 Subject: [maker-devel] Maker Execution Error Message-ID: Hello, I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error $ mpiexec -n 4 maker -help An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: submit (PID 316) MPI_COMM_WORLD rank: 2 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [submit bin]$ mpiexec --version mpiexec (OpenRTE) 1.8.4 I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? Thank you in advance. Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 29 07:46:22 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 29 Mar 2016 18:16:22 +0530 Subject: [maker-devel] CCE29032016_00053.tiff Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: CCE29032016_00053.tiff Type: application/zip Size: 2665 bytes Desc: not available URL: -------------- next part -------------- Sent from my iPhone From dence at genetics.utah.edu Wed Mar 30 16:17:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 30 Mar 2016 21:17:38 +0000 Subject: [maker-devel] Maker example data for 2013 GMOD summer school In-Reply-To: References: Message-ID: <1772AAA1-C6ED-4FCA-B4C9-39F522D3D076@genetics.utah.edu> HI Qihua, I believe that most of the data we used in the tutorials are are available in the maker/data directory, which is included in all maker distributions. Please let me know if that isn?t the case. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 30, 2016, at 3:10 PM, Qihua Liang wrote: > > Hi Michael and Daniel, > > I am a graduate student in UC Riverside, and recently I am learning to use Maker for genome annotation. I was trying to find some tutorials to follow and practice on example data, and I found out that you were giving a talk on Maker during 2013 GMOD summer school and the tutorial of that is very detailed. Nice job! > > But example data under the folder you mentioned as ./maker/maker_course is not provided on the website and I am wondering if they are available to the public or not. If yes, could you send me those materials so that I could follow your tutorial to practice using Maker? > > Thank you > Best > Qihua From ereboperezsilva at gmail.com Thu Mar 31 07:57:47 2016 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Thu, 31 Mar 2016 14:57:47 +0200 Subject: [maker-devel] Question about Maker2 Message-ID: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Mar 31 12:35:36 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 31 Mar 2016 17:35:36 +0000 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: MAKER2 was used to annotate a 10 megabase section of the C. elegans genome (NGASP dataset). The algorithm was parallelized using MPI on an increasing number of CPU cores. The results demonstrate how MAKER2 scales almost linearly with CPU number (with a slope of near 1). If we project our results forward to the entire C. elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer than 24 hours on 400 CPUs I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 31 12:38:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 31 Mar 2016 11:38:14 -0600 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: <7980702B-AE01-40A8-A903-B1DE8EE3CCC4@gmail.com> If you provide all evidence on the first run, the second run will be faster because MAKER will be able to reuse alignments from the previous run. Since 90% of runtime is BLAST, being able to just reuse the BLAST reports really improves runtime. ?Carson > On Mar 31, 2016, at 11:35 AM, Daniel Ence wrote: > > Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). > > However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: > MAKER2 was used to annotate a 10 megabase section of the C. elegans genome > (NGASP dataset). The algorithm was parallelized using MPI on an increasing number > of CPU cores. The results demonstrate how MAKER2 scales almost linearly with > CPU number (with a slope of near 1). If we project our results forward to the entire C. > elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 > CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer > than 24 hours on 400 CPUs > > I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. > > ~Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: >> >> ?? >> Hello, >> >> We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. >> >> In a month we have nearly annotate a half of the genome (let's say around 40% of it). >> I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. >> The second round of annotations (in which we use the results from the first round as extra data) is faster? >> >> Thank you in advance. >> >> --- >> >> Jose Maria G. Perez-Silva. >> Departamento de Biologia Molecular y Bioquimica. >> Universidad de Oviedo. >> Spain. >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Tue Mar 1 08:37:34 2016 From: scott at scottcain.net (Scott Cain) Date: Tue, 1 Mar 2016 10:37:34 -0500 Subject: [maker-devel] GMOD in Google Summer of Code 2016 Message-ID: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Mar 1 09:19:28 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Mar 2016 16:19:28 +0000 Subject: [maker-devel] [apollo] GMOD in Google Summer of Code 2016 In-Reply-To: References: Message-ID: Woohoo! Congratulations, that?s awesome news! chris On Mar 1, 2016, at 9:37 AM, Scott Cain > wrote: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/ If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to sympa at lists.lbl.gov | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Wed Mar 2 09:32:04 2016 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Mar 2016 11:32:04 -0500 Subject: [maker-devel] Call for Abstracts for BOSC Message-ID: Hi All, I'm forwarding this call for abstracts for BOSC (Bioinformatics Open Source Conference) this year in Orlando, Florida: >From Peter Cock (p.j.a.cock at googlemail.com): As BOSC co-chair I would like to encourage you all to think about attending BOSC 2016, and if you are working on your own open source software for bioinformatics please consider submitting an abstract. See the email below and: http://news.open-bio.org/2016/03/01/bosc-2016-call-for-abstracts/ Also, as a member of the Open Bioinformatics Foundation (OBF) Board of Directors, I am delighted to let you know about the new OBF Travel Fellowship which could be used to attend BOSC: http://news.open-bio.org/2016/03/01/obf-travel-fellowship-program/ In case you missed the earlier announcement last year, we finally got rid of the paper forms for OBF membership, see: http://news.open-bio.org/2015/12/10/online-membership-form/ Thank you, Peter [Biopython developer, BOSC co-chair, OBF Secretary, etc.] -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From chankl at mpob.gov.my Tue Mar 1 00:45:46 2016 From: chankl at mpob.gov.my (Chan Kuang Lim) Date: Tue, 1 Mar 2016 15:45:46 +0800 (MYT) Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <1064605078.11733402.1456818000393.JavaMail.root@mpob.gov.my> Message-ID: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL Come and join us on: Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 2 10:13:30 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 2 Mar 2016 17:13:30 +0000 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Message-ID: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/. If you didn?t give a Fgenesh file, then you won?t get any predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL ________________________________ Come and join us on: [http://webmail.mpob.gov.my:8080/image-footer/pipoc17.jpg] 1. Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 2. 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) 3. Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) [http://webmail.mpob.gov.my:8080/image-footer/facebook-logo.jpg] Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 2 10:36:04 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 2 Mar 2016 10:36:04 -0700 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Message-ID: <333D3A3A-49BC-42ED-87F7-053AA46CC1F3@gmail.com> Also there is the chance that FgenesH has changed formats slightly for their output (it's happened a couple of times before), so if you are already running with a parameter file you purchased that could be the issues. Look at the STDERR report MAKER produces to see if FgenesH even ran and with what command. ?Carson > On Mar 2, 2016, at 10:13 AM, Daniel Ence wrote: > > Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/ . If you didn?t give a Fgenesh file, then you won?t get any predictions. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: >> >> Dear MAKER developers, >> >> I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? >> >> Thank you. >> >> Regards, >> Chan KL >> >> Come and join us on: >> >> >> >> Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my >> 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) >> Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) >> >> Malaysian Palm Oil Board - http://www.mpob.gov.my >> This email was sent using MPOB Webmail System. >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Thu Mar 3 04:01:06 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 3 Mar 2016 12:01:06 +0100 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> Message-ID: <56D81972.7000002@students.uni-mainz.de> Hello Carson, May I ask on what kind of hardware setup you guys are running MAKER? I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. On 29.02.2016 20:09, Carson Holt wrote: > You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). > > ?Carson > > >> On Feb 26, 2016, at 7:16 AM, Florian wrote: >> >> Hi all, >> >> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >> >> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >> >> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >> >> >> thanks, >> Florian >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-03-03 11:35:41.png Type: image/png Size: 149996 bytes Desc: not available URL: From jacqueline.atkins at nih.gov Thu Mar 3 11:54:19 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 18:54:19 +0000 Subject: [maker-devel] Maker Installation Questions Message-ID: Good Afternoon, I am a Systems Engineer who is attempting to install and configure maker for a user. From what I can tell, database support is optional and maker can be used without a backend database. Please confirm that this is the case. Also, could you provide any examples of how I might be able to test the functionality of the maker installation? Thank you in advance. Jackie Atkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Thu Mar 3 14:37:30 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 21:37:30 +0000 Subject: [maker-devel] Maker Install Issue Message-ID: Good Afternoon, I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 When I attempt to execute mpi_iprscan, I get the following error: Can't locate Parallel/MPIcar.pm If you could advise how I might be able to resolve this issue, it would be greatly appreciated. Thank you. Jacqueline Atkins, Contractor Sr. HPC Engineer National Institute of Allergy and Infectious Diseases SRA International Inc., A CSRA Company office 301-451-9644, mobile 301-767- 7110 5601 Fishers Lane, 6A60, Bethesda, MD 20852 Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 14:54:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 14:54:54 -0700 Subject: [maker-devel] Maker Install Issue In-Reply-To: References: Message-ID: Hi Jacqueline, mpi_iprscan and mpi_evaluator are accessory scripts made for a very specific system and purpose (development related). They are not a core part of the MAKER pipeline, are undocumented, and should be ignored. The script you use to run MAKER is ?/maker/bin/maker It is MPI enabled, and you can call it directly or via mpiexec. Thanks, Carson > On Mar 3, 2016, at 2:37 PM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Good Afternoon, > > I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 > > When I attempt to execute mpi_iprscan, I get the following error: > Can't locate Parallel/MPIcar.pm > > If you could advise how I might be able to resolve this issue, it would be greatly appreciated. > > Thank you. > > Jacqueline Atkins, Contractor > Sr. HPC Engineer > National Institute of Allergy and Infectious Diseases > SRA International Inc., A CSRA Company > office 301-451-9644, mobile 301-767- 7110 > 5601 Fishers Lane, 6A60, Bethesda, MD 20852 > Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 22:42:07 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 22:42:07 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D81972.7000002@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> <56D81972.7000002@students.uni-mainz.de> Message-ID: We run on a standard cluster. We have traditional NFS as well as more advanced Lustre options for shared storage. Each node has both locally mounted disk and in memory storage available (I never use the in memory storage though because MAKER requires a lot of temporary storage). I run using OpenMPI (it scales better than MPICH2 - also MAKER is incompatible with MVAPICH2 because of a known registered memory defect in that MPI flavor). We use the SLURM scheduler although previously we had PBS. I usually run job sizes of between 100 and 200 CPU cores (10 to 20 nodes). We have mixed node types of 12, 16, 20. and 24 core nodes. I always set TMP= to a locally mounted disk (never NFS or RAM disk). The working directory is always NFS or Lustre. I've also run under a similar configuration on the TACC and XSEDE clusters (https://www.xsede.org ). They use SLURM and previously SGE for their scheduler. I?ve been able to run on 600 plus CPU cores per job there, but I get better efficiency with multiple jobs at ~200 CPU cores (communication overhead gets too high for a single root process to handle effectively above 200 cores). MAKER will need ~2 Gb of RAM for every core you give it with MPI. ?Carson > On Mar 3, 2016, at 4:01 AM, Florian wrote: > > Hello Carson, > > May I ask on what kind of hardware setup you guys are running MAKER? > > I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? > > I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. > > On 29.02.2016 20:09, Carson Holt wrote: >> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). >> >> ?Carson >> >> >>> On Feb 26, 2016, at 7:16 AM, Florian wrote: >>> >>> Hi all, >>> >>> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >>> >>> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >>> >>> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >>> >>> >>> thanks, >>> Florian >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Mar 5 19:10:24 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 5 Mar 2016 21:10:24 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed Message-ID: Hi All, I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. ERROR: RepeatMasker failed --> rank=4, hostname=hostname ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold149 The RepeatMasker was corrected installed. Should I update Maker to V3.0? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Sun Mar 6 09:48:36 2016 From: mcsimenc at gmail.com (Matt Simenc) Date: Sun, 6 Mar 2016 08:48:36 -0800 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help Message-ID: I am working on creating a custom repeat library. I want to use the ProtExcluder.pl script, found on the maker wiki at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic to trim out possible gene sequences from the default RepeatModeler output when run on my genome. I'm getting some errors and output in which no sequences are removed from my RepeatModeler library and am wondering if you anyone has experience with this script and can help me understand the errors. I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx output (default output,blast 2.2.31+) like: ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr - I get an output file repeat_fastanoProtFinal that contains exactly the same sequences as the input repeat_fasta. - stderr has these errors: Can't exec "binaries/esl-sfetch": No such file or directory at /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. Can not open the seqfile /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. ProtExcluder.pl created a bunch of files in the directory where it is trying to unsuccessfully access the fnolow50seq file, which does not exist, though there are files whose names have the suffix fnolow50seqm, fnolow50seqmGC, and fnolow50seqmns. Any help would be appreciated! I could write a script to do this but would rather use an already debugged one to save time. Thanks! Matt Simenc Der Evolutionary Genomics Lab California State University, Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Mar 6 13:13:24 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 6 Mar 2016 13:13:24 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: Message-ID: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Hi Wenbo, The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). Thanks, Carson > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > Hi All, > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > ERROR: RepeatMasker failed > --> rank=4, hostname=hostname > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold149 > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Sun Mar 6 15:04:14 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 06 Mar 2016 22:04:14 +0000 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help In-Reply-To: References: Message-ID: Did you install hmmer3 ? need that to get esl-sfetch not sure how you configured the paths when you run this. Jason On Sun, Mar 6, 2016 at 8:48 AM Matt Simenc wrote: > I am working on creating a custom repeat library. I want to use the > ProtExcluder.pl script, found on the maker wiki at > > > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic > > to trim out possible gene sequences from the default RepeatModeler output > when run on my genome. I'm getting some errors and output in which no > sequences are removed from my RepeatModeler library and am wondering if you > anyone has experience with this script and can help me understand the > errors. > > I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx > output (default output,blast 2.2.31+) like: > > ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr > > - I get an output file repeat_fastanoProtFinal that contains exactly the > same sequences as the input repeat_fasta. > > - stderr has these errors: > > Can't exec "binaries/esl-sfetch": No such file or directory at > /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. > > Can not open the seqfile > /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq > > mergeunmatchedregion.pl seqfile > > Illegal division by zero at > /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. > > ProtExcluder.pl created a bunch of files in the directory where it is > trying to unsuccessfully access the fnolow50seq file, which does not exist, > though there are files whose names have the suffix fnolow50seqm, > fnolow50seqmGC, and fnolow50seqmns. > > Any help would be appreciated! I could write a script to do this but would > rather use an already debugged one to save time. Thanks! > > Matt Simenc > Der Evolutionary Genomics Lab > California State University, Fullerton > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Mon Mar 7 13:26:19 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Mon, 7 Mar 2016 15:26:19 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: Hi Carson, Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. ============================= Congratulations! RepeatMasker is now ready to use. The program is installed with a full version of the repeat library: DFAM Library Version = Dfam_2.0 RMLibrary Version = 20150807 Repbase Version = 20150807 ============================= I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. Thank you! Best, Wenbo 2016-03-06 15:13 GMT-05:00 Carson Holt : > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker > is not installed and configured correctly. You will have to fix whatever > is wrong with your installation, and then make sure you can get > RepeatMasker to run correctly by itself before running it inside of MAKER > (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to > v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file > /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:01:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:01:38 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" It?s not from MAKER. RepeatMasker is printing that error and then failing. ?Carson > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt >: > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? > wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:54:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:54:10 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Message-ID: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. ?Carson > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > >> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Tue Mar 8 13:22:14 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 8 Mar 2016 15:22:14 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: Hi Carson, Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. Thanks! Best, Wenbo 2016-03-07 16:54 GMT-05:00 Carson Holt : > RepeatMasker doesn?t actually finish installing until after you run it at > least once with the RepBase Libraries (i.e. first job with RepBase). During > it?s very first run it builds a bunch of needed library files under > ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The > failure message you get is that it can?t build those files (which is a > RepeatMasker error not a MAKER error). So RepeatMasker is either installed > or configured incorrectly. > > ?Carson > > > > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can > also look at MAKER?s STDERR to see exactly what command MAKER was using to > run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following > the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am > still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt : > >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >> is not installed and configured correctly. You will have to fix whatever >> is wrong with your installation, and then make sure you can get >> RepeatMasker to run correctly by itself before running it inside of MAKER >> (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >> v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 8 13:25:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Mar 2016 13:25:37 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> The issue is unrelated to MAKER. Likely something happened during your initial configuration that resulted in a partial file. Perhaps when you unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not the issue. ?Carson > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt >: > RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. > > ?Carson > > > >> On Mar 7, 2016, at 2:01 PM, Carson Holt > wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >>> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >>> >>> Hi Carson, >>> >>> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >>> >>> ============================= >>> Congratulations! RepeatMasker is now ready to use. >>> The program is installed with a full version of the repeat library: >>> DFAM Library Version = Dfam_2.0 >>> RMLibrary Version = 20150807 >>> Repbase Version = 20150807 >>> ============================= >>> >>> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >>> >>> Thank you! >>> >>> Best, >>> Wenbo >>> >>> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue Mar 8 13:39:59 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 08 Mar 2016 20:39:59 +0000 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> Message-ID: I think that may be about permissions of creating the all file in your RepeatMasker library folder - you may look at the write permissions there and see. On Tue, Mar 8, 2016 at 12:25 PM Carson Holt wrote: > The issue is unrelated to MAKER. Likely something happened during your > initial configuration that resulted in a partial file. Perhaps when you > unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not > the issue. > > ?Carson > > > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" > outside of MAKER. It was successfully finished. Then I run Maker, and there > is no error. I am curious why RepeatMasker could not build these library > files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt : > >> RepeatMasker doesn?t actually finish installing until after you run it at >> least once with the RepBase Libraries (i.e. first job with RepBase). During >> it?s very first run it builds a bunch of needed library files under >> ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The >> failure message you get is that it can?t build those files (which is a >> RepeatMasker error not a MAKER error). So RepeatMasker is either installed >> or configured incorrectly. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can >> also look at MAKER?s STDERR to see exactly what command MAKER was using to >> run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 1:26 PM, ??? wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following >> the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am >> still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt : >> >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >>> is not installed and configured correctly. You will have to fix whatever >>> is wrong with your installation, and then make sure you can get >>> RepeatMasker to run correctly by itself before running it inside of MAKER >>> (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >>> v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking >>> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >>> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to >>> V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meesters at uni-mainz.de Thu Mar 10 07:53:43 2016 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 10 Mar 2016 15:53:43 +0100 Subject: [maker-devel] maker low cpu utilization Message-ID: <56E18A77.5030509@uni-mainz.de> Dear maker-developers, As a computational scientist of our local HPC-Team, I recently installed maker and its tools. We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? Thanks a lot in advance, Best regards, Christian Meesters -- **************************************** Dr. Christian Meesters Johannes Gutenberg-Universit?t Mainz Zentrum f?r Datenverarbeitung Anselm-Franz-von-Bentzelweg 12 55099 Mainz tel. +49 (0)6131 39 26397 **************************************** From dence at genetics.utah.edu Thu Mar 10 11:22:54 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 10 Mar 2016 18:22:54 +0000 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6683A317-2DB7-4CE0-86A1-A8C7CB0931CC@genetics.utah.edu> Hi Christian, I think what you have described is normal behavior for MAKER. It spawns many child processes, most of which complete very quickly. What dataset were you running with MAKER? Did it complete successfully? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 11:34:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 11:34:59 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. ?Carson -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcessTable_simple.pm_alt Type: application/octet-stream Size: 2864 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Signal.pm_alt Type: application/octet-stream Size: 6703 bytes Desc: not available URL: -------------- next part -------------- > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 13:56:57 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 13:56:57 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> References: <56E18A77.5030509@uni-mainz.de> <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> Message-ID: <9E6397A7-FD1F-44ED-9230-479B89DC1092@gmail.com> Also the ?maker? processes should rarely use very much CPU. All they do is shepherd data between processes like augustus, snap, blast, and exonerate (there are some short intermediate processing steps, but the external tools are the work horses). So each ?maker? process is usually just waiting for external tools to complete. What maker does is divide the input data into reasonable chunks, so that there will always be a blast, snap, or augustus process running somewhere to keep all CPUs busy. If the structure of the actual input data is odd compared to the typical genome project input then there could hypothetically be a situation where not enough reasonable task chunks can be made to keep all CPUs busy. I?d really have to see your data if you think that is the issue. MAKER has the following points of parallelization. 1. Every contig goes to a separate thread. 2. Large contigs are split into overlapping pieces that go into separate threads (determined using the max_dna_len= paramter with the default being 100,000 bp) 3. BLAST databases for input evidence are split into 10 pieces (so BLAST analysis are split by 10) 4. Ab inito gene prediction on large contigs are split into overlapping sections of 10 megabases each. So unless you have a small dataset that can?t be split by any of the above parameters it should be able to parallelize. Also if your assembly contains primarily short contigs and you set min_contig such that the root process spends most of it?s time skipping contigs and less time distributing them for other processes to analyze, then that could create an apparent slowdown. I have had that happen on a couple of assemblies that had > 2 million contigs, but only ~10,000 were usable. By filtering small contigs out of the assembly, you can get around that last issue. ?Carson > On Mar 10, 2016, at 11:34 AM, Carson Holt wrote: > > The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. > > If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. > > ?Carson > > > > >> On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: >> >> Dear maker-developers, >> >> As a computational scientist of our local HPC-Team, I recently installed maker and its tools. >> >> We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). >> >> We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. >> >> I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. >> >> Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. >> >> Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? >> >> Thanks a lot in advance, >> Best regards, >> Christian Meesters >> >> -- >> **************************************** >> >> Dr. Christian Meesters >> Johannes Gutenberg-Universit?t Mainz >> Zentrum f?r Datenverarbeitung >> Anselm-Franz-von-Bentzelweg 12 >> 55099 Mainz >> >> tel. +49 (0)6131 39 26397 >> >> **************************************** >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From chenwenbo1020 at gmail.com Sun Mar 13 20:22:53 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sun, 13 Mar 2016 22:22:53 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction Message-ID: Hi All, I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 14 10:17:31 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 14 Mar 2016 16:17:31 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From chenwenbo1020 at gmail.com Tue Mar 15 14:07:28 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 15 Mar 2016 16:07:28 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence : > Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, > MAKER2, and MAKER-P publications. The difficulty when working with > relatively unstudied organisms is that might not be gold-criteria for any > given genome. > > I think that the process you describe (using RNA-seq data, protein > sequences, proteome sequence of related insects, and swiss-prot) would > result in gene models that are probably ready for manual curation and not > just as training for another ab-initio predictor (SNAP). > > To answer your specific questions: > > 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity > and specificity. This si described in more detail in this review that Mark > and I wrote several years ago: > http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html > Augustus provides measures of accuracy, sensitivity, and specificity > during it?s training procedures, although I can?t recall exactly where it > provides those. I believe that Genemark provides similar reports during > it?s own training process. I?m not certain about SNAP. In order to evaluate > your final SNAP training files, you might try running SNAP with MAKER > without any evidence and compare the distributions of AED (annotation edit > distance) values with the distribution of AED values from your prior MAKER > runs. I?d be surprised if two rounds of training improved the AED scores > much though. > > 2) If you have EST evidence that complements the RNAseq data that you > already used, then feel free to include it. MAKER treats loci that are > partially supported by EST sequences the same as it does all other loci. > MAKER evaluates the alignment evidences and chooses the ab-initio > prediction that is best supported by the alignment evidence. Partial models > result from loci where no complete ab-initio prediction was produced by any > of the predictors that you used. > > 3) see above. > > Let me know if that helps, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > > > Hi All, > > > > I am using MAKER to annotate a insect genome. Firstly, I trained > Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, > I gave them to Maker. The evidences included assembled RNA-seq data, > protein sequences of my insect, proteome sequences of three related insects > and Swiss-Prot. At last, I used the gene models generated by Maker with AED > < 0.01 to train SNAP for two rounds. So my questions are: > > > > 1. how to evaluate the results of ab initio training. How can I know > these gene finders were well trained? > > > > 2. Should I add EST evidences? How does Maker work on the locus where > there is only partial EST evidence? Will the partial EST sequences cause > gene models to be partial? > > > > 3. Is there some gold-criteria to evaluate the results of gene > prediction? How to improve it? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 15 14:19:32 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 15 Mar 2016 20:19:32 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 15, 2016, at 2:07 PM, ??? > wrote: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence >: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? > wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 15 16:16:22 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Mar 2016 16:16:22 -0600 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> References: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Message-ID: In general if you want to know if the ab inito algorithms are trained well, look at them in something like apollo. If SNAP and Augustus look like each other, and both look like the final hint based models then they are trained well. With AED it's more of a correlative rather than an absolute measurement. The lower the value, in general the better the model. If you have gold standard models you can get sensitivity and specificity metrics from programs like EVAL from WashU. But that?s not really an option for newly sequenced organisms. ?Carson > On Mar 15, 2016, at 2:19 PM, Daniel Ence wrote: > > Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 15, 2016, at 2:07 PM, ??? > wrote: >> >> Hi Daniel, >> >> Thanks for your help. >> >> "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" >> >> ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. >> >> When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? >> >> here is my parameter in the CTL file: >> >> alt_splice=0 >> always_complete=1 >> split_hit=257022 >> max_dna_len=1700000 >> >> Thanks a lot! >> >> Best, >> Wenbo >> >> >> 2016-03-14 12:17 GMT-04:00 Daniel Ence >: >> Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. >> >> I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). >> >> To answer your specific questions: >> >> 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html >> Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. >> >> 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. >> >> 3) see above. >> >> Let me know if that helps, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> > On Mar 13, 2016, at 8:22 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: >> > >> > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? >> > >> > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? >> > >> > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdubarry at genoscope.cns.fr Wed Mar 16 09:09:28 2016 From: mdubarry at genoscope.cns.fr (Marion Dubarry) Date: Wed, 16 Mar 2016 16:09:28 +0100 Subject: [maker-devel] understanding maker output Message-ID: <56E97728.6010103@genoscope.cns.fr> Dear Maker, I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : chr3.gff chr3.maker.non_overlapping_ab_initio.transcripts.fasta chr3.maker.snap_masked.transcripts.fasta theVoid.chr3/ chr3.maker.non_overlapping_ab_initio.proteins.fasta chr3.maker.snap_masked.proteins.fasta run.log 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? Thanks in advance, Marion From carsonhh at gmail.com Wed Mar 16 13:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Mar 2016 13:42:59 -0600 Subject: [maker-devel] understanding maker output In-Reply-To: <56E97728.6010103@genoscope.cns.fr> References: <56E97728.6010103@genoscope.cns.fr> Message-ID: <8F95F7E3-A955-484C-B046-0E0BC188DC49@gmail.com> Hi Marion, None of your evidence supported any of the SNAP models, so you got no results. You did have reference SNAP models in both fasta and GFF3 format (matych/match_part features), but those are just for reference. You probably have issues with either your mrna.gff or prot.gff files. You may want to familiarize yourself with how MAKER works and expected output using an online tutorial like the following ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Mar 16, 2016, at 9:09 AM, Marion Dubarry wrote: > > Dear Maker, > > I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . > > 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! > > In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : > chr3.gff > chr3.maker.non_overlapping_ab_initio.transcripts.fasta > chr3.maker.snap_masked.transcripts.fasta > theVoid.chr3/ > chr3.maker.non_overlapping_ab_initio.proteins.fasta > chr3.maker.snap_masked.proteins.fasta > run.log > > 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? > In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? > > > Thanks in advance, > Marion > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 22 05:38:57 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 22 Mar 2016 17:08:57 +0530 Subject: [maker-devel] Document 2 Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: Document 2.zip Type: application/zip Size: 3095 bytes Desc: not available URL: From mmacd at udel.edu Tue Mar 22 10:33:42 2016 From: mmacd at udel.edu (Madolyn Macdonald) Date: Tue, 22 Mar 2016 12:33:42 -0400 Subject: [maker-devel] Question about Maker output Message-ID: Hello, My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: maker-Contig206-snap-gene-3.11-mRNA-1 maker-Contig206-snap-gene-3.12-mRNA-1 I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? Thanks in advance! -- Madolyn Stinner (formerly Madolyn MacDonald) UDel Bioinformatics and Systems Biology, PhD student RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 22 14:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Mar 2016 14:31:08 -0600 Subject: [maker-devel] maker-devel post from mmacd@udel.edu requires approval In-Reply-To: References: Message-ID: Hi Madolyn, They are different genes because their ID?s are different. The numbers are meaningless, they are just iterators to make sure the ID?s are unique. Thanks, Carson > From: Madolyn Macdonald > Subject: Question about Maker output > Date: March 22, 2016 at 10:33:42 AM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. > > After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: > > maker-Contig206-snap-gene-3.11-mRNA-1 > > maker-Contig206-snap-gene-3.12-mRNA-1 > > I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? > > Thanks in advance! > > > -- > Madolyn Stinner (formerly Madolyn MacDonald) > UDel Bioinformatics and Systems Biology, PhD student > RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 24 14:56:11 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 24 Mar 2016 20:56:11 +0000 Subject: [maker-devel] question about Maker2 In-Reply-To: References: <56F4066F.4000803@fgcz.ethz.ch> Message-ID: Hi Giancarlo, Anything listed as something like maker-*-augustus was a result of MAKER sending hints to augustus, and anything like augustus-*-abinit was the result of augustus run directly from the HMM without hints. Here is more detail on the format ?> - - -gene- - Top level possibilities: maker #maker generated model snap_masked #snap run on masked sequence augustus_masked #augustus run on masked sequence etc. Internal source: abinit #ab initio model direct from HMM snap #hints provided to SNAP (alters scoring) augustus #hints provided to augustus (alters scoring) Then chunk and iterator are just to generate a uniq ID. Example: augustus_masked-scaffold11899-abinit-gene-0.6 #Produced by Augustus on masked sequence using raw HMM (no MAKER intervention). maker-scaffold11899-augustus-gene-0.6 #Produced by maker sending hints to augustus to modify scoring against the HMM ?Carson > On 3/24/16, 9:23 AM, "giancarlo.russo" > wrote: > >> Dear Mike, >> >> first of all thanks for taking care and sharing Maker, as part of the >> community I appreciate it. >> >> I have a question about the nomenclature of the annotation in the output >> file: >> what is the difference between genes named >> >> maker-Contig-XXX >> and those named >> augustus-Contig-XXX-processed genes >> ? >> >> Please find attached the maker_opts file I have used for my annotation. >> I was under the impression that the ab-initio related prefixes would be >> present only in the genes which are not marked as "maker" in column 3 of >> the gff file (i.e., those >> with both ab-initio and EST evidence) >> >> Is there something I am missing? >> >> Thanks a lot in advance, >> Giancarlo >> >> -- >> Giancarlo Russo, Ph.D. >> Functional Genomics Center Zurich >> Y32 H66 >> Winterthurerstr. 190 >> 8057 Zurich >> SWITZERLAND >> Phone: +41 44 635 39 64 >> Fax: +41 44 635 39 22 >> E-Mail: giancarlo.russo at fgcz.ethz.ch >> > > From carsonhh at gmail.com Mon Mar 28 09:10:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Mar 2016 09:10:06 -0600 Subject: [maker-devel] Maker Execution Error In-Reply-To: References: Message-ID: <007B0008-6BFD-4121-9F0D-56EA9B3A2B5A@gmail.com> Hi Jackie, From the INSTALL file included with MAKER ?> Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Also the following ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). The first one is the most likely. Thanks, Carson > On Mar 28, 2016, at 8:38 AM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Hello, > > I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error > > $ mpiexec -n 4 maker -help > > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: submit (PID 316) > MPI_COMM_WORLD rank: 2 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork > [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > [submit bin]$ mpiexec --version > mpiexec (OpenRTE) 1.8.4 > > > I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? > > Thank you in advance. > Jackie > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Mon Mar 28 08:38:39 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Mon, 28 Mar 2016 14:38:39 +0000 Subject: [maker-devel] Maker Execution Error Message-ID: Hello, I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error $ mpiexec -n 4 maker -help An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: submit (PID 316) MPI_COMM_WORLD rank: 2 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [submit bin]$ mpiexec --version mpiexec (OpenRTE) 1.8.4 I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? Thank you in advance. Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 29 06:46:22 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 29 Mar 2016 18:16:22 +0530 Subject: [maker-devel] CCE29032016_00053.tiff Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: CCE29032016_00053.tiff Type: application/zip Size: 2665 bytes Desc: not available URL: -------------- next part -------------- Sent from my iPhone From dence at genetics.utah.edu Wed Mar 30 15:17:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 30 Mar 2016 21:17:38 +0000 Subject: [maker-devel] Maker example data for 2013 GMOD summer school In-Reply-To: References: Message-ID: <1772AAA1-C6ED-4FCA-B4C9-39F522D3D076@genetics.utah.edu> HI Qihua, I believe that most of the data we used in the tutorials are are available in the maker/data directory, which is included in all maker distributions. Please let me know if that isn?t the case. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 30, 2016, at 3:10 PM, Qihua Liang wrote: > > Hi Michael and Daniel, > > I am a graduate student in UC Riverside, and recently I am learning to use Maker for genome annotation. I was trying to find some tutorials to follow and practice on example data, and I found out that you were giving a talk on Maker during 2013 GMOD summer school and the tutorial of that is very detailed. Nice job! > > But example data under the folder you mentioned as ./maker/maker_course is not provided on the website and I am wondering if they are available to the public or not. If yes, could you send me those materials so that I could follow your tutorial to practice using Maker? > > Thank you > Best > Qihua From ereboperezsilva at gmail.com Thu Mar 31 06:57:47 2016 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Thu, 31 Mar 2016 14:57:47 +0200 Subject: [maker-devel] Question about Maker2 Message-ID: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Mar 31 11:35:36 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 31 Mar 2016 17:35:36 +0000 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: MAKER2 was used to annotate a 10 megabase section of the C. elegans genome (NGASP dataset). The algorithm was parallelized using MPI on an increasing number of CPU cores. The results demonstrate how MAKER2 scales almost linearly with CPU number (with a slope of near 1). If we project our results forward to the entire C. elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer than 24 hours on 400 CPUs I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 31 11:38:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 31 Mar 2016 11:38:14 -0600 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: <7980702B-AE01-40A8-A903-B1DE8EE3CCC4@gmail.com> If you provide all evidence on the first run, the second run will be faster because MAKER will be able to reuse alignments from the previous run. Since 90% of runtime is BLAST, being able to just reuse the BLAST reports really improves runtime. ?Carson > On Mar 31, 2016, at 11:35 AM, Daniel Ence wrote: > > Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). > > However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: > MAKER2 was used to annotate a 10 megabase section of the C. elegans genome > (NGASP dataset). The algorithm was parallelized using MPI on an increasing number > of CPU cores. The results demonstrate how MAKER2 scales almost linearly with > CPU number (with a slope of near 1). If we project our results forward to the entire C. > elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 > CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer > than 24 hours on 400 CPUs > > I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. > > ~Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: >> >> ?? >> Hello, >> >> We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. >> >> In a month we have nearly annotate a half of the genome (let's say around 40% of it). >> I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. >> The second round of annotations (in which we use the results from the first round as extra data) is faster? >> >> Thank you in advance. >> >> --- >> >> Jose Maria G. Perez-Silva. >> Departamento de Biologia Molecular y Bioquimica. >> Universidad de Oviedo. >> Spain. >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Tue Mar 1 08:37:34 2016 From: scott at scottcain.net (Scott Cain) Date: Tue, 1 Mar 2016 10:37:34 -0500 Subject: [maker-devel] GMOD in Google Summer of Code 2016 Message-ID: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Mar 1 09:19:28 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Mar 2016 16:19:28 +0000 Subject: [maker-devel] [apollo] GMOD in Google Summer of Code 2016 In-Reply-To: References: Message-ID: Woohoo! Congratulations, that?s awesome news! chris On Mar 1, 2016, at 9:37 AM, Scott Cain > wrote: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/ If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to sympa at lists.lbl.gov | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Wed Mar 2 09:32:04 2016 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Mar 2016 11:32:04 -0500 Subject: [maker-devel] Call for Abstracts for BOSC Message-ID: Hi All, I'm forwarding this call for abstracts for BOSC (Bioinformatics Open Source Conference) this year in Orlando, Florida: >From Peter Cock (p.j.a.cock at googlemail.com): As BOSC co-chair I would like to encourage you all to think about attending BOSC 2016, and if you are working on your own open source software for bioinformatics please consider submitting an abstract. See the email below and: http://news.open-bio.org/2016/03/01/bosc-2016-call-for-abstracts/ Also, as a member of the Open Bioinformatics Foundation (OBF) Board of Directors, I am delighted to let you know about the new OBF Travel Fellowship which could be used to attend BOSC: http://news.open-bio.org/2016/03/01/obf-travel-fellowship-program/ In case you missed the earlier announcement last year, we finally got rid of the paper forms for OBF membership, see: http://news.open-bio.org/2015/12/10/online-membership-form/ Thank you, Peter [Biopython developer, BOSC co-chair, OBF Secretary, etc.] -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From chankl at mpob.gov.my Tue Mar 1 00:45:46 2016 From: chankl at mpob.gov.my (Chan Kuang Lim) Date: Tue, 1 Mar 2016 15:45:46 +0800 (MYT) Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <1064605078.11733402.1456818000393.JavaMail.root@mpob.gov.my> Message-ID: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL Come and join us on: Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 2 10:13:30 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 2 Mar 2016 17:13:30 +0000 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Message-ID: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/. If you didn?t give a Fgenesh file, then you won?t get any predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL ________________________________ Come and join us on: [http://webmail.mpob.gov.my:8080/image-footer/pipoc17.jpg] 1. Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 2. 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) 3. Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) [http://webmail.mpob.gov.my:8080/image-footer/facebook-logo.jpg] Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 2 10:36:04 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 2 Mar 2016 10:36:04 -0700 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Message-ID: <333D3A3A-49BC-42ED-87F7-053AA46CC1F3@gmail.com> Also there is the chance that FgenesH has changed formats slightly for their output (it's happened a couple of times before), so if you are already running with a parameter file you purchased that could be the issues. Look at the STDERR report MAKER produces to see if FgenesH even ran and with what command. ?Carson > On Mar 2, 2016, at 10:13 AM, Daniel Ence wrote: > > Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/ . If you didn?t give a Fgenesh file, then you won?t get any predictions. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: >> >> Dear MAKER developers, >> >> I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? >> >> Thank you. >> >> Regards, >> Chan KL >> >> Come and join us on: >> >> >> >> Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my >> 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) >> Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) >> >> Malaysian Palm Oil Board - http://www.mpob.gov.my >> This email was sent using MPOB Webmail System. >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Thu Mar 3 04:01:06 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 3 Mar 2016 12:01:06 +0100 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> Message-ID: <56D81972.7000002@students.uni-mainz.de> Hello Carson, May I ask on what kind of hardware setup you guys are running MAKER? I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. On 29.02.2016 20:09, Carson Holt wrote: > You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). > > ?Carson > > >> On Feb 26, 2016, at 7:16 AM, Florian wrote: >> >> Hi all, >> >> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >> >> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >> >> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >> >> >> thanks, >> Florian >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-03-03 11:35:41.png Type: image/png Size: 149996 bytes Desc: not available URL: From jacqueline.atkins at nih.gov Thu Mar 3 11:54:19 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 18:54:19 +0000 Subject: [maker-devel] Maker Installation Questions Message-ID: Good Afternoon, I am a Systems Engineer who is attempting to install and configure maker for a user. From what I can tell, database support is optional and maker can be used without a backend database. Please confirm that this is the case. Also, could you provide any examples of how I might be able to test the functionality of the maker installation? Thank you in advance. Jackie Atkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Thu Mar 3 14:37:30 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 21:37:30 +0000 Subject: [maker-devel] Maker Install Issue Message-ID: Good Afternoon, I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 When I attempt to execute mpi_iprscan, I get the following error: Can't locate Parallel/MPIcar.pm If you could advise how I might be able to resolve this issue, it would be greatly appreciated. Thank you. Jacqueline Atkins, Contractor Sr. HPC Engineer National Institute of Allergy and Infectious Diseases SRA International Inc., A CSRA Company office 301-451-9644, mobile 301-767- 7110 5601 Fishers Lane, 6A60, Bethesda, MD 20852 Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 14:54:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 14:54:54 -0700 Subject: [maker-devel] Maker Install Issue In-Reply-To: References: Message-ID: Hi Jacqueline, mpi_iprscan and mpi_evaluator are accessory scripts made for a very specific system and purpose (development related). They are not a core part of the MAKER pipeline, are undocumented, and should be ignored. The script you use to run MAKER is ?/maker/bin/maker It is MPI enabled, and you can call it directly or via mpiexec. Thanks, Carson > On Mar 3, 2016, at 2:37 PM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Good Afternoon, > > I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 > > When I attempt to execute mpi_iprscan, I get the following error: > Can't locate Parallel/MPIcar.pm > > If you could advise how I might be able to resolve this issue, it would be greatly appreciated. > > Thank you. > > Jacqueline Atkins, Contractor > Sr. HPC Engineer > National Institute of Allergy and Infectious Diseases > SRA International Inc., A CSRA Company > office 301-451-9644, mobile 301-767- 7110 > 5601 Fishers Lane, 6A60, Bethesda, MD 20852 > Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 22:42:07 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 22:42:07 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D81972.7000002@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> <56D81972.7000002@students.uni-mainz.de> Message-ID: We run on a standard cluster. We have traditional NFS as well as more advanced Lustre options for shared storage. Each node has both locally mounted disk and in memory storage available (I never use the in memory storage though because MAKER requires a lot of temporary storage). I run using OpenMPI (it scales better than MPICH2 - also MAKER is incompatible with MVAPICH2 because of a known registered memory defect in that MPI flavor). We use the SLURM scheduler although previously we had PBS. I usually run job sizes of between 100 and 200 CPU cores (10 to 20 nodes). We have mixed node types of 12, 16, 20. and 24 core nodes. I always set TMP= to a locally mounted disk (never NFS or RAM disk). The working directory is always NFS or Lustre. I've also run under a similar configuration on the TACC and XSEDE clusters (https://www.xsede.org ). They use SLURM and previously SGE for their scheduler. I?ve been able to run on 600 plus CPU cores per job there, but I get better efficiency with multiple jobs at ~200 CPU cores (communication overhead gets too high for a single root process to handle effectively above 200 cores). MAKER will need ~2 Gb of RAM for every core you give it with MPI. ?Carson > On Mar 3, 2016, at 4:01 AM, Florian wrote: > > Hello Carson, > > May I ask on what kind of hardware setup you guys are running MAKER? > > I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? > > I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. > > On 29.02.2016 20:09, Carson Holt wrote: >> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). >> >> ?Carson >> >> >>> On Feb 26, 2016, at 7:16 AM, Florian wrote: >>> >>> Hi all, >>> >>> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >>> >>> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >>> >>> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >>> >>> >>> thanks, >>> Florian >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Mar 5 19:10:24 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 5 Mar 2016 21:10:24 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed Message-ID: Hi All, I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. ERROR: RepeatMasker failed --> rank=4, hostname=hostname ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold149 The RepeatMasker was corrected installed. Should I update Maker to V3.0? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Sun Mar 6 09:48:36 2016 From: mcsimenc at gmail.com (Matt Simenc) Date: Sun, 6 Mar 2016 08:48:36 -0800 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help Message-ID: I am working on creating a custom repeat library. I want to use the ProtExcluder.pl script, found on the maker wiki at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic to trim out possible gene sequences from the default RepeatModeler output when run on my genome. I'm getting some errors and output in which no sequences are removed from my RepeatModeler library and am wondering if you anyone has experience with this script and can help me understand the errors. I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx output (default output,blast 2.2.31+) like: ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr - I get an output file repeat_fastanoProtFinal that contains exactly the same sequences as the input repeat_fasta. - stderr has these errors: Can't exec "binaries/esl-sfetch": No such file or directory at /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. Can not open the seqfile /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. ProtExcluder.pl created a bunch of files in the directory where it is trying to unsuccessfully access the fnolow50seq file, which does not exist, though there are files whose names have the suffix fnolow50seqm, fnolow50seqmGC, and fnolow50seqmns. Any help would be appreciated! I could write a script to do this but would rather use an already debugged one to save time. Thanks! Matt Simenc Der Evolutionary Genomics Lab California State University, Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Mar 6 13:13:24 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 6 Mar 2016 13:13:24 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: Message-ID: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Hi Wenbo, The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). Thanks, Carson > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > Hi All, > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > ERROR: RepeatMasker failed > --> rank=4, hostname=hostname > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold149 > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Sun Mar 6 15:04:14 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 06 Mar 2016 22:04:14 +0000 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help In-Reply-To: References: Message-ID: Did you install hmmer3 ? need that to get esl-sfetch not sure how you configured the paths when you run this. Jason On Sun, Mar 6, 2016 at 8:48 AM Matt Simenc wrote: > I am working on creating a custom repeat library. I want to use the > ProtExcluder.pl script, found on the maker wiki at > > > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic > > to trim out possible gene sequences from the default RepeatModeler output > when run on my genome. I'm getting some errors and output in which no > sequences are removed from my RepeatModeler library and am wondering if you > anyone has experience with this script and can help me understand the > errors. > > I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx > output (default output,blast 2.2.31+) like: > > ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr > > - I get an output file repeat_fastanoProtFinal that contains exactly the > same sequences as the input repeat_fasta. > > - stderr has these errors: > > Can't exec "binaries/esl-sfetch": No such file or directory at > /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. > > Can not open the seqfile > /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq > > mergeunmatchedregion.pl seqfile > > Illegal division by zero at > /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. > > ProtExcluder.pl created a bunch of files in the directory where it is > trying to unsuccessfully access the fnolow50seq file, which does not exist, > though there are files whose names have the suffix fnolow50seqm, > fnolow50seqmGC, and fnolow50seqmns. > > Any help would be appreciated! I could write a script to do this but would > rather use an already debugged one to save time. Thanks! > > Matt Simenc > Der Evolutionary Genomics Lab > California State University, Fullerton > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Mon Mar 7 13:26:19 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Mon, 7 Mar 2016 15:26:19 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: Hi Carson, Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. ============================= Congratulations! RepeatMasker is now ready to use. The program is installed with a full version of the repeat library: DFAM Library Version = Dfam_2.0 RMLibrary Version = 20150807 Repbase Version = 20150807 ============================= I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. Thank you! Best, Wenbo 2016-03-06 15:13 GMT-05:00 Carson Holt : > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker > is not installed and configured correctly. You will have to fix whatever > is wrong with your installation, and then make sure you can get > RepeatMasker to run correctly by itself before running it inside of MAKER > (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to > v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file > /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:01:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:01:38 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" It?s not from MAKER. RepeatMasker is printing that error and then failing. ?Carson > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt >: > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? > wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:54:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:54:10 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Message-ID: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. ?Carson > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > >> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Tue Mar 8 13:22:14 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 8 Mar 2016 15:22:14 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: Hi Carson, Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. Thanks! Best, Wenbo 2016-03-07 16:54 GMT-05:00 Carson Holt : > RepeatMasker doesn?t actually finish installing until after you run it at > least once with the RepBase Libraries (i.e. first job with RepBase). During > it?s very first run it builds a bunch of needed library files under > ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The > failure message you get is that it can?t build those files (which is a > RepeatMasker error not a MAKER error). So RepeatMasker is either installed > or configured incorrectly. > > ?Carson > > > > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can > also look at MAKER?s STDERR to see exactly what command MAKER was using to > run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following > the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am > still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt : > >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >> is not installed and configured correctly. You will have to fix whatever >> is wrong with your installation, and then make sure you can get >> RepeatMasker to run correctly by itself before running it inside of MAKER >> (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >> v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 8 13:25:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Mar 2016 13:25:37 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> The issue is unrelated to MAKER. Likely something happened during your initial configuration that resulted in a partial file. Perhaps when you unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not the issue. ?Carson > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt >: > RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. > > ?Carson > > > >> On Mar 7, 2016, at 2:01 PM, Carson Holt > wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >>> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >>> >>> Hi Carson, >>> >>> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >>> >>> ============================= >>> Congratulations! RepeatMasker is now ready to use. >>> The program is installed with a full version of the repeat library: >>> DFAM Library Version = Dfam_2.0 >>> RMLibrary Version = 20150807 >>> Repbase Version = 20150807 >>> ============================= >>> >>> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >>> >>> Thank you! >>> >>> Best, >>> Wenbo >>> >>> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue Mar 8 13:39:59 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 08 Mar 2016 20:39:59 +0000 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> Message-ID: I think that may be about permissions of creating the all file in your RepeatMasker library folder - you may look at the write permissions there and see. On Tue, Mar 8, 2016 at 12:25 PM Carson Holt wrote: > The issue is unrelated to MAKER. Likely something happened during your > initial configuration that resulted in a partial file. Perhaps when you > unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not > the issue. > > ?Carson > > > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" > outside of MAKER. It was successfully finished. Then I run Maker, and there > is no error. I am curious why RepeatMasker could not build these library > files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt : > >> RepeatMasker doesn?t actually finish installing until after you run it at >> least once with the RepBase Libraries (i.e. first job with RepBase). During >> it?s very first run it builds a bunch of needed library files under >> ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The >> failure message you get is that it can?t build those files (which is a >> RepeatMasker error not a MAKER error). So RepeatMasker is either installed >> or configured incorrectly. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can >> also look at MAKER?s STDERR to see exactly what command MAKER was using to >> run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 1:26 PM, ??? wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following >> the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am >> still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt : >> >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >>> is not installed and configured correctly. You will have to fix whatever >>> is wrong with your installation, and then make sure you can get >>> RepeatMasker to run correctly by itself before running it inside of MAKER >>> (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >>> v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking >>> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >>> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to >>> V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meesters at uni-mainz.de Thu Mar 10 07:53:43 2016 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 10 Mar 2016 15:53:43 +0100 Subject: [maker-devel] maker low cpu utilization Message-ID: <56E18A77.5030509@uni-mainz.de> Dear maker-developers, As a computational scientist of our local HPC-Team, I recently installed maker and its tools. We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? Thanks a lot in advance, Best regards, Christian Meesters -- **************************************** Dr. Christian Meesters Johannes Gutenberg-Universit?t Mainz Zentrum f?r Datenverarbeitung Anselm-Franz-von-Bentzelweg 12 55099 Mainz tel. +49 (0)6131 39 26397 **************************************** From dence at genetics.utah.edu Thu Mar 10 11:22:54 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 10 Mar 2016 18:22:54 +0000 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6683A317-2DB7-4CE0-86A1-A8C7CB0931CC@genetics.utah.edu> Hi Christian, I think what you have described is normal behavior for MAKER. It spawns many child processes, most of which complete very quickly. What dataset were you running with MAKER? Did it complete successfully? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 11:34:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 11:34:59 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. ?Carson -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcessTable_simple.pm_alt Type: application/octet-stream Size: 2864 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Signal.pm_alt Type: application/octet-stream Size: 6703 bytes Desc: not available URL: -------------- next part -------------- > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 13:56:57 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 13:56:57 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> References: <56E18A77.5030509@uni-mainz.de> <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> Message-ID: <9E6397A7-FD1F-44ED-9230-479B89DC1092@gmail.com> Also the ?maker? processes should rarely use very much CPU. All they do is shepherd data between processes like augustus, snap, blast, and exonerate (there are some short intermediate processing steps, but the external tools are the work horses). So each ?maker? process is usually just waiting for external tools to complete. What maker does is divide the input data into reasonable chunks, so that there will always be a blast, snap, or augustus process running somewhere to keep all CPUs busy. If the structure of the actual input data is odd compared to the typical genome project input then there could hypothetically be a situation where not enough reasonable task chunks can be made to keep all CPUs busy. I?d really have to see your data if you think that is the issue. MAKER has the following points of parallelization. 1. Every contig goes to a separate thread. 2. Large contigs are split into overlapping pieces that go into separate threads (determined using the max_dna_len= paramter with the default being 100,000 bp) 3. BLAST databases for input evidence are split into 10 pieces (so BLAST analysis are split by 10) 4. Ab inito gene prediction on large contigs are split into overlapping sections of 10 megabases each. So unless you have a small dataset that can?t be split by any of the above parameters it should be able to parallelize. Also if your assembly contains primarily short contigs and you set min_contig such that the root process spends most of it?s time skipping contigs and less time distributing them for other processes to analyze, then that could create an apparent slowdown. I have had that happen on a couple of assemblies that had > 2 million contigs, but only ~10,000 were usable. By filtering small contigs out of the assembly, you can get around that last issue. ?Carson > On Mar 10, 2016, at 11:34 AM, Carson Holt wrote: > > The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. > > If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. > > ?Carson > > > > >> On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: >> >> Dear maker-developers, >> >> As a computational scientist of our local HPC-Team, I recently installed maker and its tools. >> >> We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). >> >> We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. >> >> I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. >> >> Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. >> >> Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? >> >> Thanks a lot in advance, >> Best regards, >> Christian Meesters >> >> -- >> **************************************** >> >> Dr. Christian Meesters >> Johannes Gutenberg-Universit?t Mainz >> Zentrum f?r Datenverarbeitung >> Anselm-Franz-von-Bentzelweg 12 >> 55099 Mainz >> >> tel. +49 (0)6131 39 26397 >> >> **************************************** >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From chenwenbo1020 at gmail.com Sun Mar 13 20:22:53 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sun, 13 Mar 2016 22:22:53 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction Message-ID: Hi All, I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 14 10:17:31 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 14 Mar 2016 16:17:31 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From chenwenbo1020 at gmail.com Tue Mar 15 14:07:28 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 15 Mar 2016 16:07:28 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence : > Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, > MAKER2, and MAKER-P publications. The difficulty when working with > relatively unstudied organisms is that might not be gold-criteria for any > given genome. > > I think that the process you describe (using RNA-seq data, protein > sequences, proteome sequence of related insects, and swiss-prot) would > result in gene models that are probably ready for manual curation and not > just as training for another ab-initio predictor (SNAP). > > To answer your specific questions: > > 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity > and specificity. This si described in more detail in this review that Mark > and I wrote several years ago: > http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html > Augustus provides measures of accuracy, sensitivity, and specificity > during it?s training procedures, although I can?t recall exactly where it > provides those. I believe that Genemark provides similar reports during > it?s own training process. I?m not certain about SNAP. In order to evaluate > your final SNAP training files, you might try running SNAP with MAKER > without any evidence and compare the distributions of AED (annotation edit > distance) values with the distribution of AED values from your prior MAKER > runs. I?d be surprised if two rounds of training improved the AED scores > much though. > > 2) If you have EST evidence that complements the RNAseq data that you > already used, then feel free to include it. MAKER treats loci that are > partially supported by EST sequences the same as it does all other loci. > MAKER evaluates the alignment evidences and chooses the ab-initio > prediction that is best supported by the alignment evidence. Partial models > result from loci where no complete ab-initio prediction was produced by any > of the predictors that you used. > > 3) see above. > > Let me know if that helps, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > > > Hi All, > > > > I am using MAKER to annotate a insect genome. Firstly, I trained > Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, > I gave them to Maker. The evidences included assembled RNA-seq data, > protein sequences of my insect, proteome sequences of three related insects > and Swiss-Prot. At last, I used the gene models generated by Maker with AED > < 0.01 to train SNAP for two rounds. So my questions are: > > > > 1. how to evaluate the results of ab initio training. How can I know > these gene finders were well trained? > > > > 2. Should I add EST evidences? How does Maker work on the locus where > there is only partial EST evidence? Will the partial EST sequences cause > gene models to be partial? > > > > 3. Is there some gold-criteria to evaluate the results of gene > prediction? How to improve it? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 15 14:19:32 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 15 Mar 2016 20:19:32 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 15, 2016, at 2:07 PM, ??? > wrote: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence >: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? > wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 15 16:16:22 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Mar 2016 16:16:22 -0600 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> References: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Message-ID: In general if you want to know if the ab inito algorithms are trained well, look at them in something like apollo. If SNAP and Augustus look like each other, and both look like the final hint based models then they are trained well. With AED it's more of a correlative rather than an absolute measurement. The lower the value, in general the better the model. If you have gold standard models you can get sensitivity and specificity metrics from programs like EVAL from WashU. But that?s not really an option for newly sequenced organisms. ?Carson > On Mar 15, 2016, at 2:19 PM, Daniel Ence wrote: > > Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 15, 2016, at 2:07 PM, ??? > wrote: >> >> Hi Daniel, >> >> Thanks for your help. >> >> "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" >> >> ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. >> >> When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? >> >> here is my parameter in the CTL file: >> >> alt_splice=0 >> always_complete=1 >> split_hit=257022 >> max_dna_len=1700000 >> >> Thanks a lot! >> >> Best, >> Wenbo >> >> >> 2016-03-14 12:17 GMT-04:00 Daniel Ence >: >> Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. >> >> I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). >> >> To answer your specific questions: >> >> 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html >> Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. >> >> 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. >> >> 3) see above. >> >> Let me know if that helps, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> > On Mar 13, 2016, at 8:22 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: >> > >> > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? >> > >> > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? >> > >> > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdubarry at genoscope.cns.fr Wed Mar 16 09:09:28 2016 From: mdubarry at genoscope.cns.fr (Marion Dubarry) Date: Wed, 16 Mar 2016 16:09:28 +0100 Subject: [maker-devel] understanding maker output Message-ID: <56E97728.6010103@genoscope.cns.fr> Dear Maker, I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : chr3.gff chr3.maker.non_overlapping_ab_initio.transcripts.fasta chr3.maker.snap_masked.transcripts.fasta theVoid.chr3/ chr3.maker.non_overlapping_ab_initio.proteins.fasta chr3.maker.snap_masked.proteins.fasta run.log 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? Thanks in advance, Marion From carsonhh at gmail.com Wed Mar 16 13:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Mar 2016 13:42:59 -0600 Subject: [maker-devel] understanding maker output In-Reply-To: <56E97728.6010103@genoscope.cns.fr> References: <56E97728.6010103@genoscope.cns.fr> Message-ID: <8F95F7E3-A955-484C-B046-0E0BC188DC49@gmail.com> Hi Marion, None of your evidence supported any of the SNAP models, so you got no results. You did have reference SNAP models in both fasta and GFF3 format (matych/match_part features), but those are just for reference. You probably have issues with either your mrna.gff or prot.gff files. You may want to familiarize yourself with how MAKER works and expected output using an online tutorial like the following ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Mar 16, 2016, at 9:09 AM, Marion Dubarry wrote: > > Dear Maker, > > I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . > > 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! > > In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : > chr3.gff > chr3.maker.non_overlapping_ab_initio.transcripts.fasta > chr3.maker.snap_masked.transcripts.fasta > theVoid.chr3/ > chr3.maker.non_overlapping_ab_initio.proteins.fasta > chr3.maker.snap_masked.proteins.fasta > run.log > > 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? > In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? > > > Thanks in advance, > Marion > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 22 05:38:57 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 22 Mar 2016 17:08:57 +0530 Subject: [maker-devel] Document 2 Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: Document 2.zip Type: application/zip Size: 3095 bytes Desc: not available URL: From mmacd at udel.edu Tue Mar 22 10:33:42 2016 From: mmacd at udel.edu (Madolyn Macdonald) Date: Tue, 22 Mar 2016 12:33:42 -0400 Subject: [maker-devel] Question about Maker output Message-ID: Hello, My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: maker-Contig206-snap-gene-3.11-mRNA-1 maker-Contig206-snap-gene-3.12-mRNA-1 I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? Thanks in advance! -- Madolyn Stinner (formerly Madolyn MacDonald) UDel Bioinformatics and Systems Biology, PhD student RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 22 14:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Mar 2016 14:31:08 -0600 Subject: [maker-devel] maker-devel post from mmacd@udel.edu requires approval In-Reply-To: References: Message-ID: Hi Madolyn, They are different genes because their ID?s are different. The numbers are meaningless, they are just iterators to make sure the ID?s are unique. Thanks, Carson > From: Madolyn Macdonald > Subject: Question about Maker output > Date: March 22, 2016 at 10:33:42 AM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. > > After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: > > maker-Contig206-snap-gene-3.11-mRNA-1 > > maker-Contig206-snap-gene-3.12-mRNA-1 > > I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? > > Thanks in advance! > > > -- > Madolyn Stinner (formerly Madolyn MacDonald) > UDel Bioinformatics and Systems Biology, PhD student > RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 24 14:56:11 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 24 Mar 2016 20:56:11 +0000 Subject: [maker-devel] question about Maker2 In-Reply-To: References: <56F4066F.4000803@fgcz.ethz.ch> Message-ID: Hi Giancarlo, Anything listed as something like maker-*-augustus was a result of MAKER sending hints to augustus, and anything like augustus-*-abinit was the result of augustus run directly from the HMM without hints. Here is more detail on the format ?> - - -gene- - Top level possibilities: maker #maker generated model snap_masked #snap run on masked sequence augustus_masked #augustus run on masked sequence etc. Internal source: abinit #ab initio model direct from HMM snap #hints provided to SNAP (alters scoring) augustus #hints provided to augustus (alters scoring) Then chunk and iterator are just to generate a uniq ID. Example: augustus_masked-scaffold11899-abinit-gene-0.6 #Produced by Augustus on masked sequence using raw HMM (no MAKER intervention). maker-scaffold11899-augustus-gene-0.6 #Produced by maker sending hints to augustus to modify scoring against the HMM ?Carson > On 3/24/16, 9:23 AM, "giancarlo.russo" > wrote: > >> Dear Mike, >> >> first of all thanks for taking care and sharing Maker, as part of the >> community I appreciate it. >> >> I have a question about the nomenclature of the annotation in the output >> file: >> what is the difference between genes named >> >> maker-Contig-XXX >> and those named >> augustus-Contig-XXX-processed genes >> ? >> >> Please find attached the maker_opts file I have used for my annotation. >> I was under the impression that the ab-initio related prefixes would be >> present only in the genes which are not marked as "maker" in column 3 of >> the gff file (i.e., those >> with both ab-initio and EST evidence) >> >> Is there something I am missing? >> >> Thanks a lot in advance, >> Giancarlo >> >> -- >> Giancarlo Russo, Ph.D. >> Functional Genomics Center Zurich >> Y32 H66 >> Winterthurerstr. 190 >> 8057 Zurich >> SWITZERLAND >> Phone: +41 44 635 39 64 >> Fax: +41 44 635 39 22 >> E-Mail: giancarlo.russo at fgcz.ethz.ch >> > > From carsonhh at gmail.com Mon Mar 28 09:10:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Mar 2016 09:10:06 -0600 Subject: [maker-devel] Maker Execution Error In-Reply-To: References: Message-ID: <007B0008-6BFD-4121-9F0D-56EA9B3A2B5A@gmail.com> Hi Jackie, From the INSTALL file included with MAKER ?> Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Also the following ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). The first one is the most likely. Thanks, Carson > On Mar 28, 2016, at 8:38 AM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Hello, > > I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error > > $ mpiexec -n 4 maker -help > > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: submit (PID 316) > MPI_COMM_WORLD rank: 2 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork > [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > [submit bin]$ mpiexec --version > mpiexec (OpenRTE) 1.8.4 > > > I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? > > Thank you in advance. > Jackie > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Mon Mar 28 08:38:39 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Mon, 28 Mar 2016 14:38:39 +0000 Subject: [maker-devel] Maker Execution Error Message-ID: Hello, I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error $ mpiexec -n 4 maker -help An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: submit (PID 316) MPI_COMM_WORLD rank: 2 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [submit bin]$ mpiexec --version mpiexec (OpenRTE) 1.8.4 I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? Thank you in advance. Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 29 06:46:22 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 29 Mar 2016 18:16:22 +0530 Subject: [maker-devel] CCE29032016_00053.tiff Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: CCE29032016_00053.tiff Type: application/zip Size: 2665 bytes Desc: not available URL: -------------- next part -------------- Sent from my iPhone From dence at genetics.utah.edu Wed Mar 30 15:17:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 30 Mar 2016 21:17:38 +0000 Subject: [maker-devel] Maker example data for 2013 GMOD summer school In-Reply-To: References: Message-ID: <1772AAA1-C6ED-4FCA-B4C9-39F522D3D076@genetics.utah.edu> HI Qihua, I believe that most of the data we used in the tutorials are are available in the maker/data directory, which is included in all maker distributions. Please let me know if that isn?t the case. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 30, 2016, at 3:10 PM, Qihua Liang wrote: > > Hi Michael and Daniel, > > I am a graduate student in UC Riverside, and recently I am learning to use Maker for genome annotation. I was trying to find some tutorials to follow and practice on example data, and I found out that you were giving a talk on Maker during 2013 GMOD summer school and the tutorial of that is very detailed. Nice job! > > But example data under the folder you mentioned as ./maker/maker_course is not provided on the website and I am wondering if they are available to the public or not. If yes, could you send me those materials so that I could follow your tutorial to practice using Maker? > > Thank you > Best > Qihua From ereboperezsilva at gmail.com Thu Mar 31 06:57:47 2016 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Thu, 31 Mar 2016 14:57:47 +0200 Subject: [maker-devel] Question about Maker2 Message-ID: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Mar 31 11:35:36 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 31 Mar 2016 17:35:36 +0000 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: MAKER2 was used to annotate a 10 megabase section of the C. elegans genome (NGASP dataset). The algorithm was parallelized using MPI on an increasing number of CPU cores. The results demonstrate how MAKER2 scales almost linearly with CPU number (with a slope of near 1). If we project our results forward to the entire C. elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer than 24 hours on 400 CPUs I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 31 11:38:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 31 Mar 2016 11:38:14 -0600 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: <7980702B-AE01-40A8-A903-B1DE8EE3CCC4@gmail.com> If you provide all evidence on the first run, the second run will be faster because MAKER will be able to reuse alignments from the previous run. Since 90% of runtime is BLAST, being able to just reuse the BLAST reports really improves runtime. ?Carson > On Mar 31, 2016, at 11:35 AM, Daniel Ence wrote: > > Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). > > However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: > MAKER2 was used to annotate a 10 megabase section of the C. elegans genome > (NGASP dataset). The algorithm was parallelized using MPI on an increasing number > of CPU cores. The results demonstrate how MAKER2 scales almost linearly with > CPU number (with a slope of near 1). If we project our results forward to the entire C. > elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 > CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer > than 24 hours on 400 CPUs > > I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. > > ~Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: >> >> ?? >> Hello, >> >> We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. >> >> In a month we have nearly annotate a half of the genome (let's say around 40% of it). >> I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. >> The second round of annotations (in which we use the results from the first round as extra data) is faster? >> >> Thank you in advance. >> >> --- >> >> Jose Maria G. Perez-Silva. >> Departamento de Biologia Molecular y Bioquimica. >> Universidad de Oviedo. >> Spain. >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Tue Mar 1 08:37:34 2016 From: scott at scottcain.net (Scott Cain) Date: Tue, 1 Mar 2016 10:37:34 -0500 Subject: [maker-devel] GMOD in Google Summer of Code 2016 Message-ID: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue Mar 1 09:19:28 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 1 Mar 2016 16:19:28 +0000 Subject: [maker-devel] [apollo] GMOD in Google Summer of Code 2016 In-Reply-To: References: Message-ID: Woohoo! Congratulations, that?s awesome news! chris On Mar 1, 2016, at 9:37 AM, Scott Cain > wrote: Hello, Very good news! GMOD (as part of the Open Genome Informatics group along with Reactome) has been accepted into Google Summer of Code this year. If you are or know of a student that might like to participate, please take a look at http://gmod.org/wiki/GSOC_Project_Ideas_2016 where there are several really interesting project ideas. It is also possible for students to suggest their own ideas and we will try hard to find them a mentor. Please let me know if you have any questions about GSoC. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/ If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to sympa at lists.lbl.gov | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott at scottcain.net Wed Mar 2 09:32:04 2016 From: scott at scottcain.net (Scott Cain) Date: Wed, 2 Mar 2016 11:32:04 -0500 Subject: [maker-devel] Call for Abstracts for BOSC Message-ID: Hi All, I'm forwarding this call for abstracts for BOSC (Bioinformatics Open Source Conference) this year in Orlando, Florida: >From Peter Cock (p.j.a.cock at googlemail.com): As BOSC co-chair I would like to encourage you all to think about attending BOSC 2016, and if you are working on your own open source software for bioinformatics please consider submitting an abstract. See the email below and: http://news.open-bio.org/2016/03/01/bosc-2016-call-for-abstracts/ Also, as a member of the Open Bioinformatics Foundation (OBF) Board of Directors, I am delighted to let you know about the new OBF Travel Fellowship which could be used to attend BOSC: http://news.open-bio.org/2016/03/01/obf-travel-fellowship-program/ In case you missed the earlier announcement last year, we finally got rid of the paper forms for OBF membership, see: http://news.open-bio.org/2015/12/10/online-membership-form/ Thank you, Peter [Biopython developer, BOSC co-chair, OBF Secretary, etc.] -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From chankl at mpob.gov.my Tue Mar 1 00:45:46 2016 From: chankl at mpob.gov.my (Chan Kuang Lim) Date: Tue, 1 Mar 2016 15:45:46 +0800 (MYT) Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <1064605078.11733402.1456818000393.JavaMail.root@mpob.gov.my> Message-ID: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL Come and join us on: Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed Mar 2 10:13:30 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 2 Mar 2016 17:13:30 +0000 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> Message-ID: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/. If you didn?t give a Fgenesh file, then you won?t get any predictions. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: Dear MAKER developers, I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? Thank you. Regards, Chan KL ________________________________ Come and join us on: [http://webmail.mpob.gov.my:8080/image-footer/pipoc17.jpg] 1. Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my 2. 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) 3. Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) [http://webmail.mpob.gov.my:8080/image-footer/facebook-logo.jpg] Malaysian Palm Oil Board - http://www.mpob.gov.my This email was sent using MPOB Webmail System. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Mar 2 10:36:04 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 2 Mar 2016 10:36:04 -0700 Subject: [maker-devel] No genes predicted by Fgenesh in MAKER In-Reply-To: <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> References: <416056681.11736428.1456818346146.JavaMail.root@mpob.gov.my> <84E44B4B-BCCE-4EB8-8A94-0333EB285101@genetics.utah.edu> Message-ID: <333D3A3A-49BC-42ED-87F7-053AA46CC1F3@gmail.com> Also there is the chance that FgenesH has changed formats slightly for their output (it's happened a couple of times before), so if you are already running with a parameter file you purchased that could be the issues. Look at the STDERR report MAKER produces to see if FgenesH even ran and with what command. ?Carson > On Mar 2, 2016, at 10:13 AM, Daniel Ence wrote: > > Hi Chan, Fgenesh is a gene predictor that requires users to purchase parameter files from their company: http://www.softberry.com/ . If you didn?t give a Fgenesh file, then you won?t get any predictions. > > ~Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 1, 2016, at 12:45 AM, Chan Kuang Lim > wrote: >> >> Dear MAKER developers, >> >> I am using MAKER 2.31.8, with SNAP, AUGUSTUS and Fgenesh. I have tested my sequences, with many different parameters. MAKER output gives genes predicted by SNAP and AUGUSTUS, but no genes predicted by Fgenesh. I do not get any error message. The sequences FINISHED successful. May I know what are the possible mistake I have done? >> >> Thank you. >> >> Regards, >> Chan KL >> >> Come and join us on: >> >> >> >> Journal of Oil Palm Research is now available free online at http://jopr.mpob.gov.my >> 22nd MPOB Transfer of Technology Seminar 2016 (2 June 2016) >> Persidangan Pekebun Kecil Sawit Kebangsaan 2016 (11 - 12 Oktober 2016) >> >> Malaysian Palm Oil Board - http://www.mpob.gov.my >> This email was sent using MPOB Webmail System. >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Thu Mar 3 04:01:06 2016 From: fdolze at students.uni-mainz.de (Florian) Date: Thu, 3 Mar 2016 12:01:06 +0100 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> Message-ID: <56D81972.7000002@students.uni-mainz.de> Hello Carson, May I ask on what kind of hardware setup you guys are running MAKER? I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. On 29.02.2016 20:09, Carson Holt wrote: > You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). > > ?Carson > > >> On Feb 26, 2016, at 7:16 AM, Florian wrote: >> >> Hi all, >> >> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >> >> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >> >> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >> >> >> thanks, >> Florian >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2016-03-03 11:35:41.png Type: image/png Size: 149996 bytes Desc: not available URL: From jacqueline.atkins at nih.gov Thu Mar 3 11:54:19 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 18:54:19 +0000 Subject: [maker-devel] Maker Installation Questions Message-ID: Good Afternoon, I am a Systems Engineer who is attempting to install and configure maker for a user. From what I can tell, database support is optional and maker can be used without a backend database. Please confirm that this is the case. Also, could you provide any examples of how I might be able to test the functionality of the maker installation? Thank you in advance. Jackie Atkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Thu Mar 3 14:37:30 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Thu, 3 Mar 2016 21:37:30 +0000 Subject: [maker-devel] Maker Install Issue Message-ID: Good Afternoon, I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 When I attempt to execute mpi_iprscan, I get the following error: Can't locate Parallel/MPIcar.pm If you could advise how I might be able to resolve this issue, it would be greatly appreciated. Thank you. Jacqueline Atkins, Contractor Sr. HPC Engineer National Institute of Allergy and Infectious Diseases SRA International Inc., A CSRA Company office 301-451-9644, mobile 301-767- 7110 5601 Fishers Lane, 6A60, Bethesda, MD 20852 Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 14:54:54 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 14:54:54 -0700 Subject: [maker-devel] Maker Install Issue In-Reply-To: References: Message-ID: Hi Jacqueline, mpi_iprscan and mpi_evaluator are accessory scripts made for a very specific system and purpose (development related). They are not a core part of the MAKER pipeline, are undocumented, and should be ignored. The script you use to run MAKER is ?/maker/bin/maker It is MPI enabled, and you can call it directly or via mpiexec. Thanks, Carson > On Mar 3, 2016, at 2:37 PM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Good Afternoon, > > I have installed Maker v 2.31.8 on RHEL 6, perl 5.16 > > When I attempt to execute mpi_iprscan, I get the following error: > Can't locate Parallel/MPIcar.pm > > If you could advise how I might be able to resolve this issue, it would be greatly appreciated. > > Thank you. > > Jacqueline Atkins, Contractor > Sr. HPC Engineer > National Institute of Allergy and Infectious Diseases > SRA International Inc., A CSRA Company > office 301-451-9644, mobile 301-767- 7110 > 5601 Fishers Lane, 6A60, Bethesda, MD 20852 > Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 3 22:42:07 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 3 Mar 2016 22:42:07 -0700 Subject: [maker-devel] Possible to redirect maker output? In-Reply-To: <56D81972.7000002@students.uni-mainz.de> References: <56D05E2A.1040201@students.uni-mainz.de> <75FD2CDE-AD66-416A-9A3E-6AF49B3FB13F@gmail.com> <56D81972.7000002@students.uni-mainz.de> Message-ID: We run on a standard cluster. We have traditional NFS as well as more advanced Lustre options for shared storage. Each node has both locally mounted disk and in memory storage available (I never use the in memory storage though because MAKER requires a lot of temporary storage). I run using OpenMPI (it scales better than MPICH2 - also MAKER is incompatible with MVAPICH2 because of a known registered memory defect in that MPI flavor). We use the SLURM scheduler although previously we had PBS. I usually run job sizes of between 100 and 200 CPU cores (10 to 20 nodes). We have mixed node types of 12, 16, 20. and 24 core nodes. I always set TMP= to a locally mounted disk (never NFS or RAM disk). The working directory is always NFS or Lustre. I've also run under a similar configuration on the TACC and XSEDE clusters (https://www.xsede.org ). They use SLURM and previously SGE for their scheduler. I?ve been able to run on 600 plus CPU cores per job there, but I get better efficiency with multiple jobs at ~200 CPU cores (communication overhead gets too high for a single root process to handle effectively above 200 cores). MAKER will need ~2 Gb of RAM for every core you give it with MPI. ?Carson > On Mar 3, 2016, at 4:01 AM, Florian wrote: > > Hello Carson, > > May I ask on what kind of hardware setup you guys are running MAKER? > > I cant seem to get this running performantly on our cluster. There are usually only 2-3 cores running on 100% and the rest is idle waiting (I THINK due to I/O blockage but I'm not sure). Any ideas how I could find the cause for this problem? > > I attached a screenshot of the node status for the first hour of the last MAKER run if this is any help. > > On 29.02.2016 20:09, Carson Holt wrote: >> You can try setting TMP= in the control files to a RAM disk location (You will need a lot of RAM though, perhaps 500Gb). Even then some components used by MAKER may not function properly with tmpfs, but you can try. If it doesn?t work you?ll get an error. The main output directory on the other hand must be globally accessible to all nodes if working with MPI, and a RAM disk will only exist and be accessible on a single node (even though a directory with the same name may exists on multiple nodes, they will actually be separate and distinct locations, i.e. /dev/shm). >> >> ?Carson >> >> >>> On Feb 26, 2016, at 7:16 AM, Florian wrote: >>> >>> Hi all, >>> >>> I am trying to run maker on a cluster (2 nodes with 64 cores each), to speed things up I copied all input files to a ramdisk to reduce I/O time, but all subsequent results are still written to hdd. >>> >>> Is there a way I can tell maker to write the maker.results files to ramdisk (or generally any other directory than the current working dir) too? (are they actually used for the current run or are only files in the temp files location used?) >>> >>> Is anybody experienced with running maker on a similar setup and could tell me how you are handling this? >>> >>> >>> thanks, >>> Florian >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Sat Mar 5 19:10:24 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sat, 5 Mar 2016 21:10:24 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed Message-ID: Hi All, I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. ERROR: RepeatMasker failed --> rank=4, hostname=hostname ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold149 The RepeatMasker was corrected installed. Should I update Maker to V3.0? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcsimenc at gmail.com Sun Mar 6 09:48:36 2016 From: mcsimenc at gmail.com (Matt Simenc) Date: Sun, 6 Mar 2016 08:48:36 -0800 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help Message-ID: I am working on creating a custom repeat library. I want to use the ProtExcluder.pl script, found on the maker wiki at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic to trim out possible gene sequences from the default RepeatModeler output when run on my genome. I'm getting some errors and output in which no sequences are removed from my RepeatModeler library and am wondering if you anyone has experience with this script and can help me understand the errors. I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx output (default output,blast 2.2.31+) like: ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr - I get an output file repeat_fastanoProtFinal that contains exactly the same sequences as the input repeat_fasta. - stderr has these errors: Can't exec "binaries/esl-sfetch": No such file or directory at /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. Can not open the seqfile /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq mergeunmatchedregion.pl seqfile Illegal division by zero at /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. ProtExcluder.pl created a bunch of files in the directory where it is trying to unsuccessfully access the fnolow50seq file, which does not exist, though there are files whose names have the suffix fnolow50seqm, fnolow50seqmGC, and fnolow50seqmns. Any help would be appreciated! I could write a script to do this but would rather use an already debugged one to save time. Thanks! Matt Simenc Der Evolutionary Genomics Lab California State University, Fullerton -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun Mar 6 13:13:24 2016 From: carsonhh at gmail.com (Carson Holt) Date: Sun, 6 Mar 2016 13:13:24 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: Message-ID: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Hi Wenbo, The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). Thanks, Carson > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > Hi All, > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > ERROR: RepeatMasker failed > --> rank=4, hostname=hostname > ERROR: Failed while doing repeat masking > ERROR: Chunk failed at level:0, tier_type:1 > FAILED CONTIG:scaffold149 > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Sun Mar 6 15:04:14 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Sun, 06 Mar 2016 22:04:14 +0000 Subject: [maker-devel] Custom Repeat Library: ProtExcluder.pl help In-Reply-To: References: Message-ID: Did you install hmmer3 ? need that to get esl-sfetch not sure how you configured the paths when you run this. Jason On Sun, Mar 6, 2016 at 8:48 AM Matt Simenc wrote: > I am working on creating a custom repeat library. I want to use the > ProtExcluder.pl script, found on the maker wiki at > > > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic > > to trim out possible gene sequences from the default RepeatModeler output > when run on my genome. I'm getting some errors and output in which no > sequences are removed from my RepeatModeler library and am wondering if you > anyone has experience with this script and can help me understand the > errors. > > I am feeding ProtExcluder.pl a FASTA file from RepeatModeler and blastx > output (default output,blast 2.2.31+) like: > > ProtExcluder.pl blast_output repeat_fasta 1>stdout 2>stderr > > - I get an output file repeat_fastanoProtFinal that contains exactly the > same sequences as the input repeat_fasta. > > - stderr has these errors: > > Can't exec "binaries/esl-sfetch": No such file or directory at > /share/apps/genomics/ProtExcluder1.1/mspesl-sfetch.pl line 17. > > Can not open the seqfile > /home/joshd/data/azolla/blasts/repeats/RepeatModeler.celera_blastx_PT-1.1-orthofinder/AzlRptMdlrLib.celera_blastx_PT-1.1-orthofinder_1e-5.fnolowm50seq > > mergeunmatchedregion.pl seqfile > > Illegal division by zero at > /share/apps/genomics/ProtExcluder1.1/GCcontent.pl line 122. > > ProtExcluder.pl created a bunch of files in the directory where it is > trying to unsuccessfully access the fnolow50seq file, which does not exist, > though there are files whose names have the suffix fnolow50seqm, > fnolow50seqmGC, and fnolow50seqmns. > > Any help would be appreciated! I could write a script to do this but would > rather use an already debugged one to save time. Thanks! > > Matt Simenc > Der Evolutionary Genomics Lab > California State University, Fullerton > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Mon Mar 7 13:26:19 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Mon, 7 Mar 2016 15:26:19 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: Hi Carson, Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. ============================= Congratulations! RepeatMasker is now ready to use. The program is installed with a full version of the repeat library: DFAM Library Version = Dfam_2.0 RMLibrary Version = 20150807 Repbase Version = 20150807 ============================= I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. Thank you! Best, Wenbo 2016-03-06 15:13 GMT-05:00 Carson Holt : > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker > is not installed and configured correctly. You will have to fix whatever > is wrong with your installation, and then make sure you can get > RepeatMasker to run correctly by itself before running it inside of MAKER > (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to > v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file > /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:01:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:01:38 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> Message-ID: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" It?s not from MAKER. RepeatMasker is printing that error and then failing. ?Carson > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt >: > Hi Wenbo, > > The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). > > Thanks, > Carson > > > > On Mar 5, 2016, at 7:10 PM, ??? > wrote: > > > > Hi All, > > > > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: > > > > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. > > ERROR: RepeatMasker failed > > --> rank=4, hostname=hostname > > ERROR: Failed while doing repeat masking > > ERROR: Chunk failed at level:0, tier_type:1 > > FAILED CONTIG:scaffold149 > > > > > > The RepeatMasker was corrected installed. Should I update Maker to V3.0? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Mar 7 14:54:10 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 7 Mar 2016 14:54:10 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> Message-ID: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. ?Carson > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > >> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenwenbo1020 at gmail.com Tue Mar 8 13:22:14 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 8 Mar 2016 15:22:14 -0500 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: Hi Carson, Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. Thanks! Best, Wenbo 2016-03-07 16:54 GMT-05:00 Carson Holt : > RepeatMasker doesn?t actually finish installing until after you run it at > least once with the RepBase Libraries (i.e. first job with RepBase). During > it?s very first run it builds a bunch of needed library files under > ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The > failure message you get is that it can?t build those files (which is a > RepeatMasker error not a MAKER error). So RepeatMasker is either installed > or configured incorrectly. > > ?Carson > > > > On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: > > Make sure you use the same library you are giving it with MAKER. You can > also look at MAKER?s STDERR to see exactly what command MAKER was using to > run RepeatMasker. > > This error ?> "RepeatMasker::createLib(): Error invoking > /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" > > > It?s not from MAKER. RepeatMasker is printing that error and then failing. > > ?Carson > > > > On Mar 7, 2016, at 1:26 PM, ??? wrote: > > Hi Carson, > > Thank you for your reply. I installed RepeatMasker following > the Installation in their website, and got these information below. > > ============================= > Congratulations! RepeatMasker is now ready to use. > The program is installed with a full version of the repeat library: > DFAM Library Version = Dfam_2.0 > RMLibrary Version = 20150807 > Repbase Version = 20150807 > ============================= > > I run RepeatMasker directly on one scaffold, and got no error. So I am > still confused by the error given by MAKER. > > Thank you! > > Best, > Wenbo > > 2016-03-06 15:13 GMT-05:00 Carson Holt : > >> Hi Wenbo, >> >> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >> is not installed and configured correctly. You will have to fix whatever >> is wrong with your installation, and then make sure you can get >> RepeatMasker to run correctly by itself before running it inside of MAKER >> (i.e. run RepeatMasker directly on some test data). >> >> Thanks, >> Carson >> >> >> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >> > >> > Hi All, >> > >> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >> v4.0.6. then I came with this error: >> > >> > RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >> > ERROR: RepeatMasker failed >> > --> rank=4, hostname=hostname >> > ERROR: Failed while doing repeat masking >> > ERROR: Chunk failed at level:0, tier_type:1 >> > FAILED CONTIG:scaffold149 >> > >> > >> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 8 13:25:37 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Mar 2016 13:25:37 -0700 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> Message-ID: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> The issue is unrelated to MAKER. Likely something happened during your initial configuration that resulted in a partial file. Perhaps when you unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not the issue. ?Carson > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" outside of MAKER. It was successfully finished. Then I run Maker, and there is no error. I am curious why RepeatMasker could not build these library files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt >: > RepeatMasker doesn?t actually finish installing until after you run it at least once with the RepBase Libraries (i.e. first job with RepBase). During it?s very first run it builds a bunch of needed library files under ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The failure message you get is that it can?t build those files (which is a RepeatMasker error not a MAKER error). So RepeatMasker is either installed or configured incorrectly. > > ?Carson > > > >> On Mar 7, 2016, at 2:01 PM, Carson Holt > wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can also look at MAKER?s STDERR to see exactly what command MAKER was using to run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >>> On Mar 7, 2016, at 1:26 PM, ??? > wrote: >>> >>> Hi Carson, >>> >>> Thank you for your reply. I installed RepeatMasker following the Installation in their website, and got these information below. >>> >>> ============================= >>> Congratulations! RepeatMasker is now ready to use. >>> The program is installed with a full version of the repeat library: >>> DFAM Library Version = Dfam_2.0 >>> RMLibrary Version = 20150807 >>> Repbase Version = 20150807 >>> ============================= >>> >>> I run RepeatMasker directly on one scaffold, and got no error. So I am still confused by the error given by MAKER. >>> >>> Thank you! >>> >>> Best, >>> Wenbo >>> >>> 2016-03-06 15:13 GMT-05:00 Carson Holt >: >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker is not installed and configured correctly. You will have to fix whatever is wrong with your installation, and then make sure you can get RepeatMasker to run correctly by itself before running it inside of MAKER (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? > wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Tue Mar 8 13:39:59 2016 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 08 Mar 2016 20:39:59 +0000 Subject: [maker-devel] ERROR: RepeatMasker failed In-Reply-To: <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> References: <1E28A618-B9E8-4467-93F0-8E2AF6695626@gmail.com> <17D740BD-C02E-4C91-97E3-0677001B51B2@gmail.com> <5452E83A-98E5-4C96-8BC7-F4792CE2CA50@gmail.com> <3C305A9D-D2B2-4858-8F3F-B1B50F82C845@gmail.com> Message-ID: I think that may be about permissions of creating the all file in your RepeatMasker library folder - you may look at the write permissions there and see. On Tue, Mar 8, 2016 at 12:25 PM Carson Holt wrote: > The issue is unrelated to MAKER. Likely something happened during your > initial configuration that resulted in a partial file. Perhaps when you > unpackaged RepBase. Whether you ran it inside or outside of MAEKR was not > the issue. > > ?Carson > > > On Mar 8, 2016, at 1:22 PM, ??? wrote: > > Hi Carson, > > Thank you! I re-install the RepeatMasker, and run it with "-species all" > outside of MAKER. It was successfully finished. Then I run Maker, and there > is no error. I am curious why RepeatMasker could not build these library > files when it was run in the MAKER. > > Thanks! > > Best, > Wenbo > > 2016-03-07 16:54 GMT-05:00 Carson Holt : > >> RepeatMasker doesn?t actually finish installing until after you run it at >> least once with the RepBase Libraries (i.e. first job with RepBase). During >> it?s very first run it builds a bunch of needed library files under >> ?/RepeatMasker/Libraries/ or sometimes under ~/.RepeatMaskerCache/. The >> failure message you get is that it can?t build those files (which is a >> RepeatMasker error not a MAKER error). So RepeatMasker is either installed >> or configured incorrectly. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 2:01 PM, Carson Holt wrote: >> >> Make sure you use the same library you are giving it with MAKER. You can >> also look at MAKER?s STDERR to see exactly what command MAKER was using to >> run RepeatMasker. >> >> This error ?> "RepeatMasker::createLib(): Error invoking >> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file" >> >> >> It?s not from MAKER. RepeatMasker is printing that error and then failing. >> >> ?Carson >> >> >> >> On Mar 7, 2016, at 1:26 PM, ??? wrote: >> >> Hi Carson, >> >> Thank you for your reply. I installed RepeatMasker following >> the Installation in their website, and got these information below. >> >> ============================= >> Congratulations! RepeatMasker is now ready to use. >> The program is installed with a full version of the repeat library: >> DFAM Library Version = Dfam_2.0 >> RMLibrary Version = 20150807 >> Repbase Version = 20150807 >> ============================= >> >> I run RepeatMasker directly on one scaffold, and got no error. So I am >> still confused by the error given by MAKER. >> >> Thank you! >> >> Best, >> Wenbo >> >> 2016-03-06 15:13 GMT-05:00 Carson Holt : >> >>> Hi Wenbo, >>> >>> The error is from RepeatMasker and not MAKER. It means that RepeatMasker >>> is not installed and configured correctly. You will have to fix whatever >>> is wrong with your installation, and then make sure you can get >>> RepeatMasker to run correctly by itself before running it inside of MAKER >>> (i.e. run RepeatMasker directly on some test data). >>> >>> Thanks, >>> Carson >>> >>> >>> > On Mar 5, 2016, at 7:10 PM, ??? wrote: >>> > >>> > Hi All, >>> > >>> > I have run Maker (v2.31.8) successfully. Now I update RepeatMasker to >>> v4.0.6. then I came with this error: >>> > >>> > RepeatMasker::createLib(): Error invoking >>> /home/chenwb/programs/ncbi-blast-2.2.28+/bin/makeblastdb on file >>> /home/chenwb/programs/RepeatMasker/Libraries/20150807/general/simple.lib. >>> > ERROR: RepeatMasker failed >>> > --> rank=4, hostname=hostname >>> > ERROR: Failed while doing repeat masking >>> > ERROR: Chunk failed at level:0, tier_type:1 >>> > FAILED CONTIG:scaffold149 >>> > >>> > >>> > The RepeatMasker was corrected installed. Should I update Maker to >>> V3.0? >>> > >>> > Thank you! >>> > >>> > Best regards, >>> > Wenbo >>> > _______________________________________________ >>> > maker-devel mailing list >>> > maker-devel at box290.bluehost.com >>> > >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meesters at uni-mainz.de Thu Mar 10 07:53:43 2016 From: meesters at uni-mainz.de (Christian Meesters) Date: Thu, 10 Mar 2016 15:53:43 +0100 Subject: [maker-devel] maker low cpu utilization Message-ID: <56E18A77.5030509@uni-mainz.de> Dear maker-developers, As a computational scientist of our local HPC-Team, I recently installed maker and its tools. We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? Thanks a lot in advance, Best regards, Christian Meesters -- **************************************** Dr. Christian Meesters Johannes Gutenberg-Universit?t Mainz Zentrum f?r Datenverarbeitung Anselm-Franz-von-Bentzelweg 12 55099 Mainz tel. +49 (0)6131 39 26397 **************************************** From dence at genetics.utah.edu Thu Mar 10 11:22:54 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 10 Mar 2016 18:22:54 +0000 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6683A317-2DB7-4CE0-86A1-A8C7CB0931CC@genetics.utah.edu> Hi Christian, I think what you have described is normal behavior for MAKER. It spawns many child processes, most of which complete very quickly. What dataset were you running with MAKER? Did it complete successfully? ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 11:34:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 11:34:59 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <56E18A77.5030509@uni-mainz.de> References: <56E18A77.5030509@uni-mainz.de> Message-ID: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. ?Carson -------------- next part -------------- A non-text attachment was scrubbed... Name: ProcessTable_simple.pm_alt Type: application/octet-stream Size: 2864 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Signal.pm_alt Type: application/octet-stream Size: 6703 bytes Desc: not available URL: -------------- next part -------------- > On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: > > Dear maker-developers, > > As a computational scientist of our local HPC-Team, I recently installed maker and its tools. > > We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). > > We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. > > I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. > > Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. > > Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? > > Thanks a lot in advance, > Best regards, > Christian Meesters > > -- > **************************************** > > Dr. Christian Meesters > Johannes Gutenberg-Universit?t Mainz > Zentrum f?r Datenverarbeitung > Anselm-Franz-von-Bentzelweg 12 > 55099 Mainz > > tel. +49 (0)6131 39 26397 > > **************************************** > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Mar 10 13:56:57 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 10 Mar 2016 13:56:57 -0700 Subject: [maker-devel] maker low cpu utilization In-Reply-To: <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> References: <56E18A77.5030509@uni-mainz.de> <6BB86BB2-62DB-4D95-A4F0-3D0B55975CC1@gmail.com> Message-ID: <9E6397A7-FD1F-44ED-9230-479B89DC1092@gmail.com> Also the ?maker? processes should rarely use very much CPU. All they do is shepherd data between processes like augustus, snap, blast, and exonerate (there are some short intermediate processing steps, but the external tools are the work horses). So each ?maker? process is usually just waiting for external tools to complete. What maker does is divide the input data into reasonable chunks, so that there will always be a blast, snap, or augustus process running somewhere to keep all CPUs busy. If the structure of the actual input data is odd compared to the typical genome project input then there could hypothetically be a situation where not enough reasonable task chunks can be made to keep all CPUs busy. I?d really have to see your data if you think that is the issue. MAKER has the following points of parallelization. 1. Every contig goes to a separate thread. 2. Large contigs are split into overlapping pieces that go into separate threads (determined using the max_dna_len= paramter with the default being 100,000 bp) 3. BLAST databases for input evidence are split into 10 pieces (so BLAST analysis are split by 10) 4. Ab inito gene prediction on large contigs are split into overlapping sections of 10 megabases each. So unless you have a small dataset that can?t be split by any of the above parameters it should be able to parallelize. Also if your assembly contains primarily short contigs and you set min_contig such that the root process spends most of it?s time skipping contigs and less time distributing them for other processes to analyze, then that could create an apparent slowdown. I have had that happen on a couple of assemblies that had > 2 million contigs, but only ~10,000 were usable. By filtering small contigs out of the assembly, you can get around that last issue. ?Carson > On Mar 10, 2016, at 11:34 AM, Carson Holt wrote: > > The ?ps? calls should run at startup (they are checking the MPI configuration before MAKER connects to the communication ring and will generate somewhat informative errors for common mis-configurations when users run MAKER with MPI). Because it is one per process (MAKER is not yet connected to MPI at this point) and you have so many CPUs on a single node, it may delay startup by a few seconds, but that?s it. Once MAKER gets into the actual run, you won?t see those processes again. > > If it bothers you there is an alternative to have MAKER query the process table programmatically rather than via ?ps' (it?s not the default because it works on fewer architectures but should work on AMD). To do the work around, you will need to install Proc::ProcessTable from CPAN, then replace ?/maker/lib/Proc/ProcessTable_simple.pm and ?/maker/lib/Proc/Signal.pm with the attached alternate files. > > ?Carson > > > > >> On Mar 10, 2016, at 7:53 AM, Christian Meesters wrote: >> >> Dear maker-developers, >> >> As a computational scientist of our local HPC-Team, I recently installed maker and its tools. >> >> We encountered a most peculiar problem: Distributed over 2 nodes, 64 cores each (AMD OPT6272 "bulldozer"), all started processes take up ~20 % of the possible CPU whilst the node show a full load of processes. Amongst this 20 % there is some system overhead (~4%). >> >> We then wrote a little wrapper / submission script, such that the ctl-Files were altered and all reference input is copied unto ramdisks (each node provides the same path, there are then 2 copies of each reference file, prior to starting maker). Still no change - IO is not a bottleneck, here. >> >> I then wanted to trace individual PIDs, but they are frequently changing. However, I saw > 170 instances ps concurrently running and the same amount of 'sh'. >> >> Only augustus should about 100% CPU usage, all other (except maker itself) showed lower usage. >> >> Have you ever experienced something similar and could perhaps provide a pointer to the cause? Could this perhaps be related to the nature of the input data (can some input data cause frequent switches of processes and therefore OS scheduler overhead)? >> >> Thanks a lot in advance, >> Best regards, >> Christian Meesters >> >> -- >> **************************************** >> >> Dr. Christian Meesters >> Johannes Gutenberg-Universit?t Mainz >> Zentrum f?r Datenverarbeitung >> Anselm-Franz-von-Bentzelweg 12 >> 55099 Mainz >> >> tel. +49 (0)6131 39 26397 >> >> **************************************** >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From chenwenbo1020 at gmail.com Sun Mar 13 20:22:53 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Sun, 13 Mar 2016 22:22:53 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction Message-ID: Hi All, I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? Thank you! Best regards, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Mar 14 10:17:31 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 14 Mar 2016 16:17:31 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From chenwenbo1020 at gmail.com Tue Mar 15 14:07:28 2016 From: chenwenbo1020 at gmail.com (=?UTF-8?B?6ZmI5paH5Y2a?=) Date: Tue, 15 Mar 2016 16:07:28 -0400 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence : > Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, > MAKER2, and MAKER-P publications. The difficulty when working with > relatively unstudied organisms is that might not be gold-criteria for any > given genome. > > I think that the process you describe (using RNA-seq data, protein > sequences, proteome sequence of related insects, and swiss-prot) would > result in gene models that are probably ready for manual curation and not > just as training for another ab-initio predictor (SNAP). > > To answer your specific questions: > > 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity > and specificity. This si described in more detail in this review that Mark > and I wrote several years ago: > http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html > Augustus provides measures of accuracy, sensitivity, and specificity > during it?s training procedures, although I can?t recall exactly where it > provides those. I believe that Genemark provides similar reports during > it?s own training process. I?m not certain about SNAP. In order to evaluate > your final SNAP training files, you might try running SNAP with MAKER > without any evidence and compare the distributions of AED (annotation edit > distance) values with the distribution of AED values from your prior MAKER > runs. I?d be surprised if two rounds of training improved the AED scores > much though. > > 2) If you have EST evidence that complements the RNAseq data that you > already used, then feel free to include it. MAKER treats loci that are > partially supported by EST sequences the same as it does all other loci. > MAKER evaluates the alignment evidences and chooses the ab-initio > prediction that is best supported by the alignment evidence. Partial models > result from loci where no complete ab-initio prediction was produced by any > of the predictors that you used. > > 3) see above. > > Let me know if that helps, > Daniel > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > > On Mar 13, 2016, at 8:22 PM, ??? wrote: > > > > Hi All, > > > > I am using MAKER to annotate a insect genome. Firstly, I trained > Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, > I gave them to Maker. The evidences included assembled RNA-seq data, > protein sequences of my insect, proteome sequences of three related insects > and Swiss-Prot. At last, I used the gene models generated by Maker with AED > < 0.01 to train SNAP for two rounds. So my questions are: > > > > 1. how to evaluate the results of ab initio training. How can I know > these gene finders were well trained? > > > > 2. Should I add EST evidences? How does Maker work on the locus where > there is only partial EST evidence? Will the partial EST sequences cause > gene models to be partial? > > > > 3. Is there some gold-criteria to evaluate the results of gene > prediction? How to improve it? > > > > Thank you! > > > > Best regards, > > Wenbo > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Mar 15 14:19:32 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 15 Mar 2016 20:19:32 +0000 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: References: Message-ID: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 15, 2016, at 2:07 PM, ??? > wrote: Hi Daniel, Thanks for your help. "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? here is my parameter in the CTL file: alt_splice=0 always_complete=1 split_hit=257022 max_dna_len=1700000 Thanks a lot! Best, Wenbo 2016-03-14 12:17 GMT-04:00 Daniel Ence >: Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). To answer your specific questions: 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. 3) see above. Let me know if that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 13, 2016, at 8:22 PM, ??? > wrote: > > Hi All, > > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: > > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? > > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? > > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? > > Thank you! > > Best regards, > Wenbo > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 15 16:16:22 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 15 Mar 2016 16:16:22 -0600 Subject: [maker-devel] How to evaluate the results of gene prediction In-Reply-To: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> References: <7DB56840-202F-486E-82BC-F75B7810979F@genetics.utah.edu> Message-ID: In general if you want to know if the ab inito algorithms are trained well, look at them in something like apollo. If SNAP and Augustus look like each other, and both look like the final hint based models then they are trained well. With AED it's more of a correlative rather than an absolute measurement. The lower the value, in general the better the model. If you have gold standard models you can get sensitivity and specificity metrics from programs like EVAL from WashU. But that?s not really an option for newly sequenced organisms. ?Carson > On Mar 15, 2016, at 2:19 PM, Daniel Ence wrote: > > Hi Wenbo, sorry for giving you a bogus suggestion. I should have realized that wouldn?t work. The defaults for the parameters you?re asking about are all ?0.5?, so half of the exons, splice sites, etc. supported by EST alignment. I think that?s your judgment as to whether those are acceptable cutoffs for training your next set of genes. We use those settings for all our training sessions, which generally give good results. > > ~Daniel > > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 15, 2016, at 2:07 PM, ??? > wrote: >> >> Hi Daniel, >> >> Thanks for your help. >> >> "In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs" >> >> ----if I run SNAP in MAKER without any evidence, the AED would be 1 for each gene models. so I can't compare it with prior run regarding the distribution of AED. >> >> When I examine the gene models in Apollo, I noticed that the intron given by SNAP is longer than other predictors. Is there any parameter controlling this? When I using the maker2zff script to filter the input models for training SNAP, any suggestion on the "-c -e -o" parameter? >> >> here is my parameter in the CTL file: >> >> alt_splice=0 >> always_complete=1 >> split_hit=257022 >> max_dna_len=1700000 >> >> Thanks a lot! >> >> Best, >> Wenbo >> >> >> 2016-03-14 12:17 GMT-04:00 Daniel Ence >: >> Hi Wenbo, MAKER has been evaluated against gold-criteria in the MAKER, MAKER2, and MAKER-P publications. The difficulty when working with relatively unstudied organisms is that might not be gold-criteria for any given genome. >> >> I think that the process you describe (using RNA-seq data, protein sequences, proteome sequence of related insects, and swiss-prot) would result in gene models that are probably ready for manual curation and not just as training for another ab-initio predictor (SNAP). >> >> To answer your specific questions: >> >> 1) Evaluation of ab-initio training is in terms of accuracy, sensitivity and specificity. This si described in more detail in this review that Mark and I wrote several years ago: http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html >> Augustus provides measures of accuracy, sensitivity, and specificity during it?s training procedures, although I can?t recall exactly where it provides those. I believe that Genemark provides similar reports during it?s own training process. I?m not certain about SNAP. In order to evaluate your final SNAP training files, you might try running SNAP with MAKER without any evidence and compare the distributions of AED (annotation edit distance) values with the distribution of AED values from your prior MAKER runs. I?d be surprised if two rounds of training improved the AED scores much though. >> >> 2) If you have EST evidence that complements the RNAseq data that you already used, then feel free to include it. MAKER treats loci that are partially supported by EST sequences the same as it does all other loci. MAKER evaluates the alignment evidences and chooses the ab-initio prediction that is best supported by the alignment evidence. Partial models result from loci where no complete ab-initio prediction was produced by any of the predictors that you used. >> >> 3) see above. >> >> Let me know if that helps, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> > On Mar 13, 2016, at 8:22 PM, ??? > wrote: >> > >> > Hi All, >> > >> > I am using MAKER to annotate a insect genome. Firstly, I trained Augustus and GeneMark-ET outside of Maker using aligned RNA-seq data. Then, I gave them to Maker. The evidences included assembled RNA-seq data, protein sequences of my insect, proteome sequences of three related insects and Swiss-Prot. At last, I used the gene models generated by Maker with AED < 0.01 to train SNAP for two rounds. So my questions are: >> > >> > 1. how to evaluate the results of ab initio training. How can I know these gene finders were well trained? >> > >> > 2. Should I add EST evidences? How does Maker work on the locus where there is only partial EST evidence? Will the partial EST sequences cause gene models to be partial? >> > >> > 3. Is there some gold-criteria to evaluate the results of gene prediction? How to improve it? >> > >> > Thank you! >> > >> > Best regards, >> > Wenbo >> > _______________________________________________ >> > maker-devel mailing list >> > maker-devel at box290.bluehost.com >> > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdubarry at genoscope.cns.fr Wed Mar 16 09:09:28 2016 From: mdubarry at genoscope.cns.fr (Marion Dubarry) Date: Wed, 16 Mar 2016 16:09:28 +0100 Subject: [maker-devel] understanding maker output Message-ID: <56E97728.6010103@genoscope.cns.fr> Dear Maker, I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : chr3.gff chr3.maker.non_overlapping_ab_initio.transcripts.fasta chr3.maker.snap_masked.transcripts.fasta theVoid.chr3/ chr3.maker.non_overlapping_ab_initio.proteins.fasta chr3.maker.snap_masked.proteins.fasta run.log 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? Thanks in advance, Marion From carsonhh at gmail.com Wed Mar 16 13:42:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Mar 2016 13:42:59 -0600 Subject: [maker-devel] understanding maker output In-Reply-To: <56E97728.6010103@genoscope.cns.fr> References: <56E97728.6010103@genoscope.cns.fr> Message-ID: <8F95F7E3-A955-484C-B046-0E0BC188DC49@gmail.com> Hi Marion, None of your evidence supported any of the SNAP models, so you got no results. You did have reference SNAP models in both fasta and GFF3 format (matych/match_part features), but those are just for reference. You probably have issues with either your mrna.gff or prot.gff files. You may want to familiarize yourself with how MAKER works and expected output using an online tutorial like the following ?> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014 ?Carson > On Mar 16, 2016, at 9:09 AM, Marion Dubarry wrote: > > Dear Maker, > > I have some issue understanding the output of maker. I ran Maker on a chromosome where I already know the number of expected genes (1332) . > > 1) I ran Maker with mrna.gff and prot.gff files and Snap (est2genome=1 protein2genome=1) and I try also with just Snap, and I obtain the same files, why ? I was expected that with just ab initio or experimental data, the results would have been different ! > > In the folder /chr3.maker.output/chr3_datastore/50/43/chr3 I have different files : > chr3.gff > chr3.maker.non_overlapping_ab_initio.transcripts.fasta > chr3.maker.snap_masked.transcripts.fasta > theVoid.chr3/ > chr3.maker.non_overlapping_ab_initio.proteins.fasta > chr3.maker.snap_masked.proteins.fasta > run.log > > 2) All of fasta files contains 1263 sequences, while the gff file contains 87178 matches. Why there is a so big differences between my files ? > In my gff file, line with column 2 = "snap_masked" and column 3 = "match" correspond to the 1263 models in fasta files. To what correspond the "repeatmasker" and "repeatrunner" matches ? > > > Thanks in advance, > Marion > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 22 05:38:57 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 22 Mar 2016 17:08:57 +0530 Subject: [maker-devel] Document 2 Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: Document 2.zip Type: application/zip Size: 3095 bytes Desc: not available URL: From mmacd at udel.edu Tue Mar 22 10:33:42 2016 From: mmacd at udel.edu (Madolyn Macdonald) Date: Tue, 22 Mar 2016 12:33:42 -0400 Subject: [maker-devel] Question about Maker output Message-ID: Hello, My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: maker-Contig206-snap-gene-3.11-mRNA-1 maker-Contig206-snap-gene-3.12-mRNA-1 I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? Thanks in advance! -- Madolyn Stinner (formerly Madolyn MacDonald) UDel Bioinformatics and Systems Biology, PhD student RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Mar 22 14:31:08 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 22 Mar 2016 14:31:08 -0600 Subject: [maker-devel] maker-devel post from mmacd@udel.edu requires approval In-Reply-To: References: Message-ID: Hi Madolyn, They are different genes because their ID?s are different. The numbers are meaningless, they are just iterators to make sure the ID?s are unique. Thanks, Carson > From: Madolyn Macdonald > Subject: Question about Maker output > Date: March 22, 2016 at 10:33:42 AM MDT > To: maker-devel at yandell-lab.org > > > Hello, > > My apologies if this has been described elsewhere, but I have not been able to find the answer to this question. > > After running fasta_merge on the Maker results, I get the fasta files which include all the gene annotations from all the different contigs in the assembly. In the transcript file, I get headers such as the two below: > > maker-Contig206-snap-gene-3.11-mRNA-1 > > maker-Contig206-snap-gene-3.12-mRNA-1 > > I was wondering what the gene-X.XX portion of the header means, for instance are 3.11 and 3.12 exons on the same gene or are they two completely separate genes? If they are separate genes, what makes them still be both "gene 3"? > > Thanks in advance! > > > -- > Madolyn Stinner (formerly Madolyn MacDonald) > UDel Bioinformatics and Systems Biology, PhD student > RIT Alumnus 13' -------------- next part -------------- An HTML attachment was scrubbed... URL: From carson.holt at genetics.utah.edu Thu Mar 24 14:56:11 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Thu, 24 Mar 2016 20:56:11 +0000 Subject: [maker-devel] question about Maker2 In-Reply-To: References: <56F4066F.4000803@fgcz.ethz.ch> Message-ID: Hi Giancarlo, Anything listed as something like maker-*-augustus was a result of MAKER sending hints to augustus, and anything like augustus-*-abinit was the result of augustus run directly from the HMM without hints. Here is more detail on the format ?> - - -gene- - Top level possibilities: maker #maker generated model snap_masked #snap run on masked sequence augustus_masked #augustus run on masked sequence etc. Internal source: abinit #ab initio model direct from HMM snap #hints provided to SNAP (alters scoring) augustus #hints provided to augustus (alters scoring) Then chunk and iterator are just to generate a uniq ID. Example: augustus_masked-scaffold11899-abinit-gene-0.6 #Produced by Augustus on masked sequence using raw HMM (no MAKER intervention). maker-scaffold11899-augustus-gene-0.6 #Produced by maker sending hints to augustus to modify scoring against the HMM ?Carson > On 3/24/16, 9:23 AM, "giancarlo.russo" > wrote: > >> Dear Mike, >> >> first of all thanks for taking care and sharing Maker, as part of the >> community I appreciate it. >> >> I have a question about the nomenclature of the annotation in the output >> file: >> what is the difference between genes named >> >> maker-Contig-XXX >> and those named >> augustus-Contig-XXX-processed genes >> ? >> >> Please find attached the maker_opts file I have used for my annotation. >> I was under the impression that the ab-initio related prefixes would be >> present only in the genes which are not marked as "maker" in column 3 of >> the gff file (i.e., those >> with both ab-initio and EST evidence) >> >> Is there something I am missing? >> >> Thanks a lot in advance, >> Giancarlo >> >> -- >> Giancarlo Russo, Ph.D. >> Functional Genomics Center Zurich >> Y32 H66 >> Winterthurerstr. 190 >> 8057 Zurich >> SWITZERLAND >> Phone: +41 44 635 39 64 >> Fax: +41 44 635 39 22 >> E-Mail: giancarlo.russo at fgcz.ethz.ch >> > > From carsonhh at gmail.com Mon Mar 28 09:10:06 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Mar 2016 09:10:06 -0600 Subject: [maker-devel] Maker Execution Error In-Reply-To: References: Message-ID: <007B0008-6BFD-4121-9F0D-56EA9B3A2B5A@gmail.com> Hi Jackie, From the INSTALL file included with MAKER ?> Note: For OpenMPI you may also want to set OMPI_MCA_mpi_warn_on_fork=0 in your ~/.bash_profile to turn off certain nonfatal warnings. Note: If jobs hang or freeze when using mpiexec under OpenMPI try adding the '-mca btl ^openib' flag to mpiexec command when running MAKER. Example: mpiexec -mca btl ^openib -n 20 maker Also the following ?> If using OpenMPI, make sure to set LD_PRELOAD to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). The first one is the most likely. Thanks, Carson > On Mar 28, 2016, at 8:38 AM, Atkins, Jacqueline (NIH/NIAID) [C] wrote: > > Hello, > > I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error > > $ mpiexec -n 4 maker -help > > An MPI process has executed an operation involving a call to the > "fork()" system call to create a child process. Open MPI is currently > operating in a condition that could result in memory corruption or > other system errors; your MPI job may hang, crash, or produce silent > data corruption. The use of fork() (or system() or other calls that > create child processes) is strongly discouraged. > > The process that invoked fork was: > > Local host: submit (PID 316) > MPI_COMM_WORLD rank: 2 > > If you are *absolutely sure* that your application will successfully > and correctly survive a call to fork(), you may disable this warning > by setting the mpi_warn_on_fork MCA parameter to 0. > -------------------------------------------------------------------------- > [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork > [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > [submit bin]$ mpiexec --version > mpiexec (OpenRTE) 1.8.4 > > > I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? > > Thank you in advance. > Jackie > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacqueline.atkins at nih.gov Mon Mar 28 08:38:39 2016 From: jacqueline.atkins at nih.gov (Atkins, Jacqueline (NIH/NIAID) [C]) Date: Mon, 28 Mar 2016 14:38:39 +0000 Subject: [maker-devel] Maker Execution Error Message-ID: Hello, I have recently installed Maker on RHEL 7/ Perl-5.16.3. When I attempt to execute, I get the following error $ mpiexec -n 4 maker -help An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: submit (PID 316) MPI_COMM_WORLD rank: 2 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- [submit:122878] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork [submit:122878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [submit bin]$ mpiexec --version mpiexec (OpenRTE) 1.8.4 I have a previous version of Maker installed that is using OpenMPI 1.3.3 and it is working fine. I was wondering if you think this might be related to the version of OpenMPI? Thank you in advance. Jackie -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Tue Mar 29 06:46:22 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 29 Mar 2016 18:16:22 +0530 Subject: [maker-devel] CCE29032016_00053.tiff Message-ID: -------------- next part -------------- A non-text attachment was scrubbed... Name: CCE29032016_00053.tiff Type: application/zip Size: 2665 bytes Desc: not available URL: -------------- next part -------------- Sent from my iPhone From dence at genetics.utah.edu Wed Mar 30 15:17:38 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 30 Mar 2016 21:17:38 +0000 Subject: [maker-devel] Maker example data for 2013 GMOD summer school In-Reply-To: References: Message-ID: <1772AAA1-C6ED-4FCA-B4C9-39F522D3D076@genetics.utah.edu> HI Qihua, I believe that most of the data we used in the tutorials are are available in the maker/data directory, which is included in all maker distributions. Please let me know if that isn?t the case. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Mar 30, 2016, at 3:10 PM, Qihua Liang wrote: > > Hi Michael and Daniel, > > I am a graduate student in UC Riverside, and recently I am learning to use Maker for genome annotation. I was trying to find some tutorials to follow and practice on example data, and I found out that you were giving a talk on Maker during 2013 GMOD summer school and the tutorial of that is very detailed. Nice job! > > But example data under the folder you mentioned as ./maker/maker_course is not provided on the website and I am wondering if they are available to the public or not. If yes, could you send me those materials so that I could follow your tutorial to practice using Maker? > > Thank you > Best > Qihua From ereboperezsilva at gmail.com Thu Mar 31 06:57:47 2016 From: ereboperezsilva at gmail.com (=?UTF-8?B?Sm9zw6kgTcKqIEcuIFBlcmV6LVNpbHZh?=) Date: Thu, 31 Mar 2016 14:57:47 +0200 Subject: [maker-devel] Question about Maker2 Message-ID: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu Mar 31 11:35:36 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 31 Mar 2016 17:35:36 +0000 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: MAKER2 was used to annotate a 10 megabase section of the C. elegans genome (NGASP dataset). The algorithm was parallelized using MPI on an increasing number of CPU cores. The results demonstrate how MAKER2 scales almost linearly with CPU number (with a slope of near 1). If we project our results forward to the entire C. elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer than 24 hours on 400 CPUs I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: ?? Hello, We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. In a month we have nearly annotate a half of the genome (let's say around 40% of it). I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. The second round of annotations (in which we use the results from the first round as extra data) is faster? Thank you in advance. --- Jose Maria G. Perez-Silva. Departamento de Biologia Molecular y Bioquimica. Universidad de Oviedo. Spain. _______________________________________________ maker-devel mailing list maker-devel at yandell-lab.org http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Mar 31 11:38:14 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 31 Mar 2016 11:38:14 -0600 Subject: [maker-devel] Question about Maker2 In-Reply-To: References: Message-ID: <7980702B-AE01-40A8-A903-B1DE8EE3CCC4@gmail.com> If you provide all evidence on the first run, the second run will be faster because MAKER will be able to reuse alignments from the previous run. Since 90% of runtime is BLAST, being able to just reuse the BLAST reports really improves runtime. ?Carson > On Mar 31, 2016, at 11:35 AM, Daniel Ence wrote: > > Hi Jose, the time it takes maker to annotate a genome depends greatly on the hardware setup (as you pointed out, processors, memory, etc) as well as the size of the genome and the size and type of the datasets you use to annotate the genome (numerous RNAseq datasets for example will take longer than a project without any RNAseq data). > > However, the MPI parallelization implemented in MAKER guarantees that the runtime should scale linearly with the number of processors allotted to the MAKER run. This is explained in the MAKER2 paper (Holt and Yandell), which I?m going to quote: > MAKER2 was used to annotate a 10 megabase section of the C. elegans genome > (NGASP dataset). The algorithm was parallelized using MPI on an increasing number > of CPU cores. The results demonstrate how MAKER2 scales almost linearly with > CPU number (with a slope of near 1). If we project our results forward to the entire C. > elegans genome (~100 megabases), MAKER2 should take under 10 hours on 32 > CPUs to complete; similarly, the human genome (~3 gigabases) would require fewer > than 24 hours on 400 CPUs > > I?m also not sure what you mean by the first run taking less time than the second run. By the first run do you mean running with est2genome turned on to create models for training ab-initio predictors? In that case, I would guess that the second run would take longer, but it should be too big of a difference. > > ~Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Mar 31, 2016, at 6:57 AM, Jos? M? G. Perez-Silva > wrote: >> >> ?? >> Hello, >> >> We are using Maker for the first time, and we are a little concerned about the time it takes the program to finish a whole genome (2.2Gb) ab-initio annotation. >> >> In a month we have nearly annotate a half of the genome (let's say around 40% of it). >> I'd like to know how much time and under which technical specifications (processors, memory, ...) does it takes to annotate a complete genome for the first time. >> The second round of annotations (in which we use the results from the first round as extra data) is faster? >> >> Thank you in advance. >> >> --- >> >> Jose Maria G. Perez-Silva. >> Departamento de Biologia Molecular y Bioquimica. >> Universidad de Oviedo. >> Spain. >> _______________________________________________ >> maker-devel mailing list >> maker-devel at yandell-lab.org >> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: