From carsonhh at gmail.com Mon Apr 1 09:50:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 10:50:58 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364760124.37890.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Apr 1 11:27:23 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 1 Apr 2013 16:27:23 +0000 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: References: Message-ID: Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 11:38:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 12:38:18 -0400 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: Message-ID: I'm thinking the same thing. Reviewing how I parse GeneMarks output, I just use their start and end coordinates (no changes). Over the weekend I altered the GeneMark parser to walk the gene start away from the supposed origin so as not to let this happen. In your E. coli test case since you have multiple contigs for what is likely a single circular genome, this would be the correct behavior as you don't want to treat each contig as an independent circular chromosome. I should probably add an is_circular option to the control files so users can select for this. I've updated the maker subversion repository so you can do an 'svn update' (I believe you are using the devel version of MAEKR correct?) Thanks, Carson From: Daniel Ence Date: Monday, 1 April, 2013 12:27 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Why are some start positions minus in the gff result? Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 14:59:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 15:59:18 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364846015.96057.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 15:29:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 16:29:40 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364847674.21064.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 16:47:38 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 17:47:38 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: Message-ID: That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Tue Apr 2 08:09:18 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 14:09:18 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs Message-ID: <515AD87E.1010800@ebi.ac.uk> Hello Carson! (Mpi) Maker (2.27) is failing when it runs blast searches. It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. I've looked at a few contigs on which maker fails and they were all rather short contigs. Maker works fine, if I - run it without mpi or - run it with mpi, but a maximum of 4 processors. (Mpi) Maker used to run fine with 128 processors before this. The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. Any idea how to overcome this?. Cheers, Michael. P.S.: A typical error message I'm getting is this: --Next Contig-- [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:LSalAtl2s8083 doing blastx repeats setting up GFF3 output and fasta chunks doing blastx repeats re reading repeat masker report. /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s8449 Length: 2187 Tries: 18!! #--------------------------------------------------------------------- From carsonhh at gmail.com Tue Apr 2 08:15:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:15:28 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364865389.66083.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: The best evidence is from mRNAseq or ESTs of the same species. ESTs and mRNAseq from related species can be used, but if protein annotations are available use those instead. This is because cross species nucleotide alinements must be translated in all 6 reading frames (3 for the query and 3 for the subject), which would basically make run times increase by 6 fold. You can try giving the cross species alignments to maker as if they were from the same species instead (est= option), fewer will align, but run times will not be overwhelming. Then provide the protein annotations from the related species combined with uniprot (maker can take comma separated lists for the input files). You can use either the program CEGMA from Ian Korf's lab or alternatively maker's protein2genome option to build an initial annotation set to use for training. Then train SNAP, Augustus, and GeneMark (Genemark self trains). For the last run let all 3 predictors run together with protein2genome now turned off. Given that the genome is only 50Mb and you have a lack of alignment evidence, you can probably safely set keep_preds=1 on the second run as the false positive rate is usually quite low for gene dense organisms and you won't get many annotations from maker otherwise without more evidence alignments. Perform your first and second runs in the same location, so maker can automatically reuse the same alignments (the second run is always very fast this way as maker won't have to rerun blast and exonerate). If your organism is a fungi (I'm just guessing because of the small genome size) you can also use this gene prediction parameter resource from Jason Stajich --> https://github.com/hyphaltip/fungi-gene-prediction-params Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 9:16 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks, i now have better insight regard to the cpu cores.. i have other questions...i dont have other info or evidences of my own genome, i only have assembled contigs....recently JGI sequenced a species that closely related to my genome (at genus level), and i have access to the data (protein, est, rna-seq reads,transcript, gene models,gff) 1.I have run maker (MWAS) using diff set of evidences, such as protein and est(JGI) and est(JGI) and uniprot database ..but both run produced diferent no of predicted genes....so my question, what is the best evidences to be used to support my annotation..is it more preferred to use larger dataset such as uniprot rather than using the data from JGI (even it closely related) 2. can i use rna-seq data (from JGI) to be used in maker...ive denovo assembled the rnaseq using clc genomics. Thanks From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 5:01 AM Subject: Re: [maker-devel] Help on error-Repeat masker That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 2 08:57:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:57:08 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AD87E.1010800@ebi.ac.uk> Message-ID: Could you set the TMP= option to a non-NFS mounted location (the default /tmp should work) and let me know if it still fails? You can also try completely deleting the LSalAtl2s.maker.output/mpi_blastdb directory before restarting. Thanks, Carson On 13-04-02 9:09 AM, "Michael Nuhn" wrote: >Hello Carson! > >(Mpi) Maker (2.27) is failing when it runs blast searches. > >It prints out the command it is trying to run. When I try to run this >command manually on the command line, blast terminates with an error, >because it either can't find the input file or it can't find a file >ending in .pin, which I think is a protein index file it expects to be >there. > >I've looked at a few contigs on which maker fails and they were all >rather short contigs. > >Maker works fine, if I > >- run it without mpi or >- run it with mpi, but a maximum of 4 processors. > >(Mpi) Maker used to run fine with 128 processors before this. > >The contigs are sorted descending by size in the genome file. I think >maker has processed the large ones and the problems it is having now >might have something to do with it running on smaller contigs. > > From looking at the error messages I thought at first the index file of >the genome might be corrupted, so I deleted it and let maker rebuild it. >This didn't fix the issue though. I have also set the path for temporary >files manually to make sure maker is not running out of temporary space. > >Any idea how to overcome this?. > >Cheers, >Michael. > >P.S.: A typical error message I'm getting is this: > >--Next Contig-- > >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 > >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAt >l2s8087.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_ >final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAt >l2s8087// >theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.tem >p_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >ERROR: Chunk failed at level:2, tier_type:0 >FAILED CONTIG:LSalAtl2s8083 > >doing blastx repeats >setting up GFF3 output and fasta chunks >doing blastx repeats >re reading repeat masker report. >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSal >Atl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAt >l2s8135.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSal >Atl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAt >l2s8119.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSal >Atl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s8449 >Length: 2187 >Tries: 18!! >#--------------------------------------------------------------------- > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mnuhn at ebi.ac.uk Tue Apr 2 09:38:31 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 15:38:31 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> References: <515AD87E.1010800@ebi.ac.uk> <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> Message-ID: <515AED67.5060906@ebi.ac.uk> On 04/02/2013 02:01 PM, Eleanor Stanley wrote: > what version of Blast are you using? > I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved using BLAST+ 2.2.27 instead I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. I am running it on one mpi instance with 128 processors and it seems to be working now. Thanks! Michael. > Ele > > > On 2 Apr 2013, at 14:09, Michael Nuhn wrote: > >> Hello Carson! >> >> (Mpi) Maker (2.27) is failing when it runs blast searches. >> >> It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. >> >> I've looked at a few contigs on which maker fails and they were all rather short contigs. >> >> Maker works fine, if I >> >> - run it without mpi or >> - run it with mpi, but a maximum of 4 processors. >> >> (Mpi) Maker used to run fine with 128 processors before this. >> >> The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. >> >> From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. >> >> Any idea how to overcome this?. >> >> Cheers, >> Michael. >> >> P.S.: A typical error message I'm getting is this: >> >> --Next Contig-- >> >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 >> -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >> fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// >> theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:LSalAtl2s8083 >> >> doing blastx repeats >> setting up GFF3 output and fasta chunks >> doing blastx repeats >> re reading repeat masker report. >> /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: LSalAtl2s8449 >> Length: 2187 >> Tries: 18!! >> #--------------------------------------------------------------------- >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Apr 2 09:16:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 10:16:44 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AED67.5060906@ebi.ac.uk> Message-ID: Good to know. Thanks, Carson On 13-04-02 10:38 AM, "Michael Nuhn" wrote: > >On 04/02/2013 02:01 PM, Eleanor Stanley wrote: >> what version of Blast are you using? >> I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved >>using BLAST+ 2.2.27 instead > >I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. > >I am running it on one mpi instance with 128 processors and it seems to >be working now. > >Thanks! >Michael. > >> Ele >> >> >> On 2 Apr 2013, at 14:09, Michael Nuhn wrote: >> >>> Hello Carson! >>> >>> (Mpi) Maker (2.27) is failing when it runs blast searches. >>> >>> It prints out the command it is trying to run. When I try to run this >>>command manually on the command line, blast terminates with an error, >>>because it either can't find the input file or it can't find a file >>>ending in .pin, which I think is a protein index file it expects to be >>>there. >>> >>> I've looked at a few contigs on which maker fails and they were all >>>rather short contigs. >>> >>> Maker works fine, if I >>> >>> - run it without mpi or >>> - run it with mpi, but a maximum of 4 processors. >>> >>> (Mpi) Maker used to run fine with 128 processors before this. >>> >>> The contigs are sorted descending by size in the genome file. I think >>>maker has processed the large ones and the problems it is having now >>>might have something to do with it running on smaller contigs. >>> >>> From looking at the error messages I thought at first the index file >>>of the genome might be corrupted, so I deleted it and let maker rebuild >>>it. This didn't fix the issue though. I have also set the path for >>>temporary files manually to make sure maker is not running out of >>>temporary space. >>> >>> Any idea how to overcome this?. >>> >>> Cheers, >>> Michael. >>> >>> P.S.: A typical error message I'm getting is this: >>> >>> --Next Contig-- >>> >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 >>> -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSal >>>Atl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o /n >>> >>>fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >>>r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LS >>>alAtl2s8087// >>> >>>theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.t >>>emp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:LSalAtl2s8083 >>> >>> doing blastx repeats >>> setting up GFF3 output and fasta chunks >>> doing blastx repeats >>> re reading repeat masker report. >>> >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/ >>>LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSal >>>Atl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/ >>>LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSal >>>Atl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/ >>>LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> #--------------------------------------------------------------------- >>> Now retrying the contig!! >>> SeqID: LSalAtl2s8449 >>> Length: 2187 >>> Tries: 18!! >>> #--------------------------------------------------------------------- >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Thu Apr 4 12:29:24 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:29:24 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Thu Apr 4 12:40:48 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:40:48 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: One more thought. If 26,28, and 30 process jobs are failing this could also be because they are not starting across nodes correctly (all end up on the same node). You would then start to run into memory problems and the job would freeze. So validating the proper cross node launch of MPI using the 'hostname' command is still probably the first thing to do. --Carson From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM To: Ram?n Fallon >, "maker-devel at yandell-lab.org" > Subject: Re: second maker2 benchmark, this time, on a cluster Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramonfallon at gmail.com Thu Apr 4 12:03:43 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Thu, 4 Apr 2013 19:03:43 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster Message-ID: Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 48proc.png Type: image/png Size: 24644 bytes Desc: not available URL: From ramonfallon at gmail.com Fri Apr 5 11:00:53 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Fri, 5 Apr 2013 18:00:53 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: References: Message-ID: Thanks for the replies Carson, Our cluster has got busy all of a sudden, so I have to wait a bit to do the hostname test. However, I'm fairly sure (not 100%, mind you) that when the process number is over 24 if will definitely run the extra processes on a separate node, and so do a proper cross node launch. On Thu, Apr 4, 2013 at 7:40 PM, Carson Holt wrote: > One more thought. If 26,28, and 30 process jobs are failing this could > also be because they are not starting across nodes correctly (all end up on > the same node). You would then start to run into memory problems and the > job would freeze. So validating the proper cross node launch of MPI using > the 'hostname' command is still probably the first thing to do. > > --Carson > > > * > * > From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM > To: Ram?n Fallon , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: second maker2 benchmark, this time, on a cluster > > Since you are using 12 core nodes (hyperthreaded cores are virtual ? > you still only have 12 cores of power not 24) and your performance curve > drops off at 12, I'm thinking there is a possibility that the other > processes did not start on a separate node. Try launching the Linux > command 'hostname' the same way you are launching maker. If all 24 lines > of output from hostname have the same host, then maker is only getting > launched on a single node. Then since there are really only 12 cores (not > 24) you would not see any significant performance improvement above 12. So > each process above 12 will reduce the power allocated to remaining > processes. So the difference from 12 to 24 (~25% performance gain) is just > what can be gained from process saturation (not all maker processes are > always at 100% cpu usage because of calls to IO so adding a few more > processes than you have cpu cores sometimes runs a little faster). > > Thanks, > Carson > > > > From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM > To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster > > Hi > > I've done another of my own benchmarks with the Maker2 svn (rev 1017) > code. Last time I went up to 12 processes, this time I aimed for 48. In > contrast to the last 12 core speed check, the target hardware was a > computer cluster, with the Gridengine queue manager. The same data set of > 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence > in one file with different names). > > The nodes in the cluster are (again) HP Proliant SL390 with two Intel > X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running > Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is > that Maker2 was launched from an NFS3 shared home directory, although the > /tmp directories are local to the process running on each node. Nodes are > interconnected via infiniband quadspeed, and because of hyperthreading, can > offer 24 "process-cores" to a job. No overlap between runs was allowed. > > Results were: > #processes time(secs) Megabases/hr > 1 6585.00 2.20 > 2 7137.00 2.03 > 4 2479.00 5.84 > 8 1088.00 13.30 > 10 866.00 16.71 > 12 715.00 20.24 > 14 666.00 21.72 > 16 651.00 22.22 > 18 613.00 23.60 > 24 559.00 25.88 > > Graph is attached to this mail. Some notes: > * A free queue on the gridengine were used so there was no load on these > nodes when run. Two nodes are available on this queue, giving a max of 48 > simualtaneous processes. > * Some processor number (6,20, etc) were deleted because I couldn't > guarantee "No load" conditions during those runs, and I had one or two > anomalies so I'd rather not include them right now. However, I expect them > to be in line with the other results. > * In general the graph shows more consistent performance than last time, > but unfortunately I got incomplete runs after processes=24. Because this is > also the max number of processes per node, it's possible that interconnects > between the nodes had something to do with runs > 24 processes being > inconsistent, however, it's not usually an issue in other programs because > quadspeed (40Gbit/s) is already a fairly fast interconnect). > * Process runs 26,28, and 30 would almost - but not quite - finish (just a > few sequences unfinished), But after this number, the analysis would hardly > get off the ground, seeming to get stuck at Repeatmasker phase. I suppose > this is our main concern at the moment, that we can't speed up beyond 24 > processes. > > Cheers / Ram?n. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 02:25:40 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 16:25:40 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked Message-ID: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 06:20:16 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 13:20:16 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <145c01ce3297$f318eab0$d94ac010$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 06:24:31 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 20:24:31 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 08:54:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 09:54:15 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: It's all CDS, from start to finish. There is never any UTR in the ab initio reference match/match_part alignments. There are two reasons for this. First most ab initio predictors don't produce UTR. Second GFF3 has n is_analysis flag, so it is impossible to separate final gene models from predicted gene models if they are both in the form gene/mRNA/exon/CDS. Augustus can predict UTR, but gien the limitation just mentioned, if I reject the model, I have to trim it before adding it to the reference information. We've actually been in discussion with the apollo development group over this limitation. Original apollo found the same limitation, so they make the same assumption for loading data into the browsing window (gene/mRNA/exon/CDS features always go in the middle annotation track and everything else goes in the reference evidence track). With the new web apollo, we're working on getting the default behavior to allow UTR in the gene predictions by using the SO predicted gene term in the GFF3 (which previously wasn't available for use in apollo and maker). So in summary. Nothing but CDS form now, but will include CDS when available in the sequence in the near future. Thanks, Carson From: "Kang, Yang Jae" Date: Saturday, 6 April, 2013 7:24 AM To: 'Michael Thon' Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 09:37:28 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 16:37:28 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 10:13:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 11:13:16 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What > I want to do is calculating Ka Ks values so that I need coding sequences. Is > there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com ] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the > augustus_masked feature there is no indication of CDS or Exon like maker > features. Is there any way for me to retrieve CDS from augustus_masked? There > were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 13:45:02 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sun, 7 Apr 2013 03:45:02 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Thank you for quick response again! I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and I additionally found the offset value some position after '>' letter. Is that indicate the starting ATG? Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I'm asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Thank you From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" < kangyangjae at gmail.com> wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Sat Apr 6 15:50:29 2013 From: barry.utah at gmail.com (Barry Moore) Date: Sat, 6 Apr 2013 14:50:29 -0600 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: <3B421D2C-590D-4593-8FA5-3CAA10A19FD4@genetics.utah.edu> On Apr 6, 2013, at 12:45 PM, Kang, Yang Jae wrote: > Thank you for quick response again! > > I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and The gene predictors will occasionally produce a transcript with no start/stop codon, set always_complete=1 in maker_opts.clt to get MAKER to try hard to force a start/stop codon. > I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? I didn't really understand that question... > Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I?m asking is that some genes were augustus_masked is a file that contains proteins of all predictions make by Augustus when working on masked sequence. Setting unmask=1 in maker_opts.ctl would instruct MAKER to also run the gene predictors on unmasked sequence and then you'd have a augustus_unmasked file for those predicitions. The non_overlapping_ab_initio files contain proteins predicted by all gene predictors for which MAKER could not find protein/RNA evidence for, so they are unsupported by physical evidence. These unsupported predictions are not promoted by MAKER into annotations in it's final output, but they are included in these files in case you want to work with them. The non_overlapping part of the name means that if multiple gene predictors produce overlapping un support ab initio predictions then MAKER will only output one of them. > redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Yes, the proteins for genes for which MAKER creates annotations will be in both files. > > Thank you > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Sunday, April 07, 2013 12:13 AM > To: Michael Thon; Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). > > --Carson > > From: Michael Thon > Date: Saturday, 6 April, 2013 10:37 AM > To: "Kang, Yang Jae" > Cc: > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... > On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > > > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 16:00:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 17:00:19 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? > Only in the maker.transcripts.fasta will have offsets other than 0, you can > use these to get the transcription offset. All other *.transcript.fasta files > will always have an offset of 0 for the reason previously mentioned. Some > genes will not start with ATG or have stop codons. These are partial models. > Set always_complete=1 to reduce these. Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? > Final selected annotations go in the maker.proteins.fasta and > maker.transcripts.fasta files. Raw unfiltered ab initio prediction from > augustus go in the augustus_masked.proteins.fasta and > augustus_masked.transcripts.fasta file (these are for reference purposes). A > set of non-redundant rejected models go in the > non-overlapping.transcripts.fasta and non-overlapping.proteins.fasta files > (if you are missing a gene you expected to find, look in this file first ? you > can add them back if you find protein domains in them for example). The reason why I?m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. > This is because some of the augustus generated models made it into the final > annotation set. > > Thanks, Carson From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com ] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" > wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhang at genome.wustl.edu Wed Apr 10 11:30:38 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Wed, 10 Apr 2013 11:30:38 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: References: Message-ID: <516593AE.8000909@genome.wustl.edu> Hi All, Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " warning, error in input file format: -3 error reading parameter BRANCH_MAT error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod Error on system: prediction step" and "Error: unknown line format". and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information " warning, error in input file format: -13 5654 dna.fa.good.gb.acc.ph2 first order for ACC 2 Error: unknown line format GC% ntron". any suggestion and comments are appreciated Thanks, Xu From xzhang at genome.wustl.edu Fri Apr 12 07:47:08 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Fri, 12 Apr 2013 07:47:08 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <516593AE.8000909@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> Message-ID: <5168024C.9040808@genome.wustl.edu> I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. Thanks, Xu On 04/10/2013 11:30 AM, xu zhang wrote: > Hi All, > > Does anybody have genemark .mod file for yeast? I tried to create my > own model file using this command" gm_es.pl > S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was > downloaded from ncbi". it failed with this error " > warning, error in input file format: > -3 > error reading parameter BRANCH_MAT > error in model file > /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod > Error on system: prediction step" and "Error: unknown line format". > > and I tried the sample file(pythium_ultimum_scaffolds.fasta) from > Carson. a mod file was created, although it also had some error > information > " warning, error in input file format: > -13 > 5654 dna.fa.good.gb.acc.ph2 > first order for ACC 2 > Error: unknown line format > GC% ntron". > > any suggestion and comments are appreciated > > Thanks, > Xu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Fri Apr 12 10:48:53 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 12 Apr 2013 08:48:53 -0700 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <5168024C.9040808@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> <5168024C.9040808@genome.wustl.edu> Message-ID: <256F7975-9744-4A53-974F-B92B0179A5B2@gmail.com> Did you email the genemark authors? They would be a better source for help. I experienced the same problems with the yeast data to train from and didn't use genemark for those species - it may be that it is expecting more introns and the files for training are empty on some rounds. Jason On Apr 12, 2013, at 5:47 AM, xu zhang wrote: > I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. > > Thanks, > Xu > > On 04/10/2013 11:30 AM, xu zhang wrote: >> Hi All, >> >> Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " >> warning, error in input file format: >> -3 >> error reading parameter BRANCH_MAT >> error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod >> Error on system: prediction step" and "Error: unknown line format". >> >> and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information >> " warning, error in input file format: >> -13 >> 5654 dna.fa.good.gb.acc.ph2 >> first order for ACC 2 >> Error: unknown line format >> GC% ntron". >> >> any suggestion and comments are appreciated >> >> Thanks, >> Xu >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jteckert at gmail.com Sun Apr 14 16:07:33 2013 From: jteckert at gmail.com (James Eckert) Date: Sun, 14 Apr 2013 17:07:33 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf Message-ID: Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 03:16:34 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Tue, 16 Apr 2013 16:16:34 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and_*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 09:20:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 10:20:01 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf In-Reply-To: Message-ID: The input GFF3 file you have the link to only contains one gene? Is that correct. If so then you should only get one gene in the output. The resulting GTF should only have the genes (ignoring all the evidence). To convert for eval use these command lines (note the flags such as -g for gff3_merge so you are only looking at genes and the fast must be included in the file, so no -n flag) gff3_merge -d maker_datastore_index.log -g -o some_file.gff add_utr_start_stop_gff some_file.gff > some_file2.gff maker2eval some_file2.gff Note that all version of MAKER after 2.09 no longer have add_utr_start_stop_gff, the UTR is now always there explicitly, so you go strait from gff3_merge and then use maker2eval_gtf However with that explanation, I have to wonder if EVAL is appropriate for you. EVAL requires a reference annotation set (that is assumed to be 100% perfect) for comparison, and you get a perfect score whenever you call the genes exactly identical to the reference set (which in itself has obvious bias, but we won't get into that). Given that you have no reference set it will not give you anything other than statistics for the distribution of introns and exon sizes. Alternate means for quality given no reference genome are AED (computed for each gene as part of the MAKER run), this is basically a variation of EVAL like statistics run against evidence clusters rather than a reference genome, or you can just use % domain content. See these links for examples of the statistics --> http://www.biomedcentral.com/1471-2105/12/491 http://www.biomedcentral.com/1471-2105/10/67 Also a figure is attached with an example of quality analysis using combined AED, domain content, and comparative orthologs. --Carson From: James Eckert Date: Sunday, 14 April, 2013 5:07 PM To: Subject: [maker-devel] Annotation quality and converting gff3 to gtf Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: B563F1FF-1E85-42E3-B79D-F7F6449F1AE9.png Type: image/png Size: 227568 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 16 10:34:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 11:34:44 -0400 Subject: [maker-devel] maker output In-Reply-To: <1366084495.59030.YahooMailNeo@web164906.mail.bf1.yahoo.com> Message-ID: For AED in general, lower is better, but you have to understand the caveats. With mRNA-seq Nnot all genes may be expressed, not all exons may be captured (mRNA can fold blocking some sequencing reactions), and sometimes the alignment may extend improperly into the intron or even merge into the neighboring gene. Also mRNA-seq captures a lot of things that aren't coding genes. But in general for mRNA-seq, as coverage increases the AED values trend toward 0, and mRNA-seq is the single most informative piece of evidence you can get for annotation (I've seen several very poor genome assemblies with horrible annotations that were saved by mRNA-seq). For mRNa-seq, give MAKER the assembled reads (trinity works well). Also for fungi, the UTR tend to overlap between genes. This can create false merging in the mRNA-seq assemblies (their AED is lower but its a false merge). Use the correct_est_fusion option in the control files to help handle that. I know there are also several members of the MAKER mailing list who have extensive experience using mRNA-seq to annotate fungi who may want to add their two cents. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 15 April, 2013 11:54 PM To: Carson Holt Subject: maker output Hi Carson, have a nice day.. I have a question about the output file from maker, recently i run my longest contigs (100kb) on Maker using rna-seq data from other related species of my genome (same genus)..and ive noticed that i managed to get expressed sequences match annotation compared using just EST and cDNA. Is this due to different size of dataset? as im using larger dataset when im incorporating rna-seq data (assembled transcript combined with cDNA and est) . The value of AED for both prdicted mRNA 0.15 (with rna-seq data) and 0.06(w/o rna-seq data). My question is which one is the most accurate prediction, can i just depends on the value of AED ( the lower the better)? How about the incorporation of rna-seq data in this case,can i conclude that rna-seq improves the annotation (based on the image i attached). Thanks for your time, really appreciate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Apr 16 10:52:07 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Apr 2013 15:52:07 +0000 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: References: Message-ID: Hi Huiquan, 1)The default behavior for Maker is that it will only annotate gene models when there is support from both the evidence (est and protein alignments) and from the ab-initio predictors. How many transcripts did you get from PASA? I expect there are about 254 sequences, which is about how many genes you annotated. If you want to get more gene models, then you need to supply more evidence. For our annotation projects, we often use some derivation of Swiss-prot, which is a hand-curated database of proteins across all kingdoms. 2) The non-overlapping ab-initio file includes ab-initio predictions that didn't overlap any gene models. If augustus and genemark predictions overlap, I think it should include both, but if the one prediction completely covers the other, I think the longer of the two would be included. Does that answer your questions? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of ??? [liuhuiquan at nwsuaf.edu.cn] Sent: Tuesday, April 16, 2013 2:16 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files Hello maker users and developers, I?m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for de novo gene prediction and supplied the PASA assembled transcripts to the est option. When maker2 finished, I used the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? Thanks in advance for any advice. I appreciate your help! best, Huiquan the maker_opts.ctl file: #-----Genome (these are always required) genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=fungi #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=my_ges.mod #GeneMark HMM file augustus_species=my2 #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=20 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=1500 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=200 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=1 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 11:01:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 12:01:27 -0400 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The "non-overlapping" file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 20:49:04 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Wed, 17 Apr 2013 09:49:04 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 09:23:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:23:54 -0400 Subject: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: correct_est_fusion is not guaranteed to never merge a gene. If you are giving maker imperfect evidence, there is only so much it can do. Also you should be using protein evidence in combination with EST evidence, especially when using the correct_est_fusion option or you are limiting it's effectiveness. MAKER does not work as well on ESTs alone, especially for organisms with few introns as internal logic is relying on the combination of evidence support. --Carson From: ??? Date: Tuesday, 16 April, 2013 9:49 PM To: Subject: Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files Hi Carson and Daniel, Thank you very much for your quick responses! By multiple tries, I have known the reason why only a few genes were annotated by maker. This is due to turn on of the ?correct_est_fusion? option. I got about 8000 transcripts from PASA assembly. Because the gene density of my fungus is very high, many of the assembled transcripts merged adjacent genes even if the trinity and PASA were used with relevant parameter. Maker may not use the merged transcripts as evidence, it the ?correct_est_fusion? option is turn on. However, even though the ?correct_est_fusion? option is used, I also found many genes produced by maker have merged more than one gene. I?m now using the ORFs (trainingSetCandidates.cds) extracted from the transcripts by PASA as the EST evidence supplied to maker. I found most of the extracted ORF can accurate match the gene model predicted by augustus and genemark. This can better resolve the ?merged gene? issues for fungi with high gene density. For the 'non-overlapping' file, if only using genemark, its predictions can be found in the 'non-overlapping' file. Is previously issue due to the gene mode generated by augustus is better that genemark, so only augustus gene was putted into the 'non-overlapping' file? Will the genes predicted only by one program not found in the 'non-overlapping' file? how to get these genes? Thank you Huiquan ???: Carson Holt ????: 2013-04-16 24:01 ???: ??? ;maker-devel at yandell-lab.org ???: Re:Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The 'non-overlapping' file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 09:16:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:16:18 -0400 Subject: [maker-devel] some strange examples of maker annotation In-Reply-To: Message-ID: maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Thu Apr 18 08:37:07 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Thu, 18 Apr 2013 21:37:07 +0800 Subject: [maker-devel] =?utf-8?q?some_strange_examples_of_maker_annotation?= Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: examples of maker annotation.docx Type: application/octet-stream Size: 1037235 bytes Desc: not available URL: From carsonhh at gmail.com Fri Apr 19 09:55:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Apr 2013 10:55:58 -0400 Subject: [maker-devel] FW: some strange examples of maker annotation In-Reply-To: Message-ID: Just forwarding this to the devel list, so it is archived. --Carson From: Carson Holt Date: Thursday, 18 April, 2013 10:16 AM To: ??? , Subject: Re: some strange examples of maker annotation maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Mon Apr 22 09:09:34 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Mon, 22 Apr 2013 10:09:34 -0400 Subject: [maker-devel] Repeatmasker error? Message-ID: <7EAEB66D-346C-4E9A-B487-B7D5BB352328@hms.harvard.edu> Greetings, Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log files are growing to GB sizes. When looking more carefully, an error seems to be occurring around the Repeatmasker stage: .... Now starting the contig!! -- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats doing blastx of proteins doing blastx of proteins doing blastx of proteins doing blastx repeats collecting blastx repeatmasking processing all repeats ERROR: Can't open seq file: /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_current_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/contig_157//theVoid.contig_157/query.masked.gff.seq No such file or directory at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm line 182 Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line 691 Process::MpiChunk::__ANON__() called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 eval {...} called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib ... I don't seem to have this problem when I fall back to the 2.25b version (though I start having major DBD:SQLite issues). I'm doing this on a cluster, running this under MPI with 50 cores. Any help/suggestions would be appreciated! -Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 22 15:25:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 22 Apr 2013 16:25:06 -0400 Subject: [maker-devel] Repeatmasker error? In-Reply-To: Message-ID: Just forwarding this e-mail chain to the devel list for archiving. --Carson From: "Freeman, Robert M." Date: Monday, 22 April, 2013 4:16 PM To: Carson Holt Subject: Re: [maker-devel] Repeatmasker error? Already looks better ... been checking stderr and it looks error-free so far (knock on wood). Thanks for the help, and sorry for the bother! Oh, should I fall back to the 2.27beta release that you announced on the list?? -b On Apr 22, 2013, at 4:09 PM, Carson Holt wrote: > Let me know you still get problems. Redirecting TMP back locally will also > give a big performance boost. > > Thanks, > Carson > > > From: "Freeman, Robert M." > Date: Monday, 22 April, 2013 4:01 PM > To: Carson Holt > Subject: Re: [maker-devel] Repeatmasker error? > > (chuckle) wow, always something new to learn -- been working with IT systems > for > 20 years, and HPC > 8, and no one has ever explained this to me. > > Have directed TMP to /scratch, which also turns out to be an Isilon-related > mount. Will re-direct all this to /tmp to see if this eliminates the problems. > > -b > > On Apr 22, 2013, at 3:43 PM, Carson Holt wrote: > >> The missing file is part of the GFF3 output, the fasta sequence to be >> specific. Sometimes on NFS (network mounted file systems), they can return >> status 'success' even though the IO event really has not succeeded yet (this >> is called asynchronous IO). The result is a certain speed gain but it also >> means that you can write a file, then immediately try and open it, and the >> system will say that it doesn't exist. On some systems you get weird files >> starting with the name '.nfs000' when these types of errors occur. NFS type >> errors are more common when you use many cpus or other jobs on the cluster >> (not just maker) are using a large amount of IO. To avoid this, MAKER tries >> to do as much work as possible in the directory specified by TMP in the >> control files. By default this is /tmp, and if you set it to something else, >> make sure that the location is locally mounted and not NFS mounted (otherwise >> it can't perform it's purpose of bypassing NFS for certain quick read/write >> operations). The newest version of MAKER unloads exonerate and even most >> gene prediction operations into TMP in addition to other steps that were >> already unloaded there in other versions of the pipeline, and I've been able >> to scale up to > 1500 cpus. >> >> Thanks, >> Carson >> >> >> >> From: "Freeman, Robert M." >> Date: Monday, 22 April, 2013 3:24 PM >> To: Carson Holt >> Subject: Re: [maker-devel] Repeatmasker error? >> >> Thanks, Carson. I'll give this a try. >> >> Randomly? Not sure ... I'll have to go back thru the logs to see if this is >> happening consistently or not. Right now, this log is close to 1 GB in size. >> When I saw it getting this large, I stopped the run as I knew errors were >> getting spewed into the log file. >> >> Thought it might be filesystem as well, but unlikely -- the location for the >> MAKER runs is on our Isilon, and these problems appear only with MAKER. >> >> Other files seem to be present... >> >>> % ls -alt >>> drwxrwx--- 3 rmf1 SYSTEMBIO_klab_genome 236 Apr 21 14:50 .. >>> drwxrwx--- 2 rmf1 SYSTEMBIO_klab_genome 48225 Apr 21 12:38 . >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:34 run.log.child.0 >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.final.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.raw.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 388512 Apr 21 12:34 evidence_0.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7269 Apr 21 12:34 >>> contig_157.102049-103030.gi%7C145478069%7Cref%7CXP_001425057% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 4561 Apr 21 12:34 >>> contig_157.101950-103090.gi%7C145514179%7Cref%7CXP_001443000% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7088 Apr 21 12:34 >>> contig_157.101956-103435.gi%7C145505343%7Cref%7CXP_001438638% >>> 2E1%7C.p_exonerate >>> .... >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7184469 Apr 21 12:33 >>> contig_157.0.sequences_r5%2Efasta.blastx >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:25 >>> query.masked.gff.def >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9885 Apr 21 12:25 >>> query.masked.gff.ann >>> -rw-r--r-- 1 rmf1 SYSTEMBIO_klab_genome 49152 Apr 21 12:25 >>> query.masked.fasta.index >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:25 >>> query.masked.fasta >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9991 Apr 21 12:25 >>> query.masked.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 28002 Apr 21 12:25 >>> contig_157.0.te_proteins%2Efasta.repeatrunner >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 3589 Apr 21 12:24 >>> contig_157.0.all.rb.out >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:23 query.fasta >>> >> It's just that the one output file MAKER is looking for isn't there. >> >> I guess the other question I should ask: as there are exonerate sequences >> there, does it appear that the pipeline is running OK, and just ignore these >> errors (somehow)? >> >> -b >> >> On Apr 22, 2013, at 12:39 PM, Carson Holt wrote: >> >>> Could you give the devel version a try to see if it experiences the same >>> failure, as it's easier to debug off of the most current code. >>> >>> Type this on the command line to download--> >>> ************************** >>> >>> user: ******* >>> password: ******* >>> >>> The error appears to be filesystem related though. Does it appear to happen >>> randomly? >>> >>> Thanks, >>> Carson >>> >>> From: "Freeman, Robert M." >>> Date: Monday, 22 April, 2013 10:09 AM >>> To: "maker-devel at yandell-lab.org" >>> Subject: [maker-devel] Repeatmasker error? >>> >>> Greetings, >>> >>> Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log >>> files are growing to GB sizes. When looking more carefully, an error seems >>> to be occurring around the Repeatmasker stage: >>> >>>> .... >>>> Now starting the contig!! >>>> -- >>>> >>>> setting up GFF3 output and fasta chunks >>>> doing repeat masking >>>> doing blastx repeats >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx repeats >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> ERROR: Can't open seq file: >>>> /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_cur >>>> rent_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/co >>>> ntig_157//theVoid.contig_157/query.masked.gff.seq >>>> No such file or directory >>>> >>>> at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm >>>> line 182 >>>> Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') >>>> called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line >>>> 691 >>>> Process::MpiChunk::__ANON__() called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 >>>> eval {...} called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 >>>> Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib >>>> ... >>> >>> I don't seem to have this problem when I fall back to the 2.25b version >>> (though I start having major DBD:SQLite issues). >>> >>> I'm doing this on a cluster, running this under MPI with 50 cores. >>> >>> Any help/suggestions would be appreciated! >>> >>> -Bob >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m >>> aker-devel_yandell-lab.org >> >> ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Mon Apr 29 10:58:09 2013 From: ejr at stowers.org (Ross, Eric) Date: Mon, 29 Apr 2013 15:58:09 +0000 Subject: [maker-devel] repeat statistics Message-ID: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From barry.moore at genetics.utah.edu Mon Apr 29 12:59:14 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 29 Apr 2013 11:59:14 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Mon Apr 29 17:49:12 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 29 Apr 2013 15:49:12 -0700 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the > web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue Apr 30 01:14:44 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 30 Apr 2013 00:14:44 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Correct. And web page is now updated as well. B On Apr 29, 2013, at 4:49 PM, Jason Stajich wrote: > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > >> Does anyone have a good tool for yanking repeat statistics out of MAKER >> gff files? >> >> SOBA can give some basic stats, but it doesn't play well with my giant >> files and I haven't figured out a way to run it locally. >> >> For that matter does anyone have a script that will calculate SOBA like >> stats locally? I'd rather avoid writing one myself if something else is >> out there. >> >> Thanks, >> >> Eric >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 08:50:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 10:50:58 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364760124.37890.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Apr 1 10:27:23 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 1 Apr 2013 16:27:23 +0000 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: References: Message-ID: Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 10:38:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 12:38:18 -0400 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: Message-ID: I'm thinking the same thing. Reviewing how I parse GeneMarks output, I just use their start and end coordinates (no changes). Over the weekend I altered the GeneMark parser to walk the gene start away from the supposed origin so as not to let this happen. In your E. coli test case since you have multiple contigs for what is likely a single circular genome, this would be the correct behavior as you don't want to treat each contig as an independent circular chromosome. I should probably add an is_circular option to the control files so users can select for this. I've updated the maker subversion repository so you can do an 'svn update' (I believe you are using the devel version of MAEKR correct?) Thanks, Carson From: Daniel Ence Date: Monday, 1 April, 2013 12:27 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Why are some start positions minus in the gff result? Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 13:59:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 15:59:18 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364846015.96057.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 14:29:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 16:29:40 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364847674.21064.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 15:47:38 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 17:47:38 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: Message-ID: That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Tue Apr 2 07:09:18 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 14:09:18 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs Message-ID: <515AD87E.1010800@ebi.ac.uk> Hello Carson! (Mpi) Maker (2.27) is failing when it runs blast searches. It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. I've looked at a few contigs on which maker fails and they were all rather short contigs. Maker works fine, if I - run it without mpi or - run it with mpi, but a maximum of 4 processors. (Mpi) Maker used to run fine with 128 processors before this. The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. Any idea how to overcome this?. Cheers, Michael. P.S.: A typical error message I'm getting is this: --Next Contig-- [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:LSalAtl2s8083 doing blastx repeats setting up GFF3 output and fasta chunks doing blastx repeats re reading repeat masker report. /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s8449 Length: 2187 Tries: 18!! #--------------------------------------------------------------------- From carsonhh at gmail.com Tue Apr 2 07:15:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:15:28 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364865389.66083.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: The best evidence is from mRNAseq or ESTs of the same species. ESTs and mRNAseq from related species can be used, but if protein annotations are available use those instead. This is because cross species nucleotide alinements must be translated in all 6 reading frames (3 for the query and 3 for the subject), which would basically make run times increase by 6 fold. You can try giving the cross species alignments to maker as if they were from the same species instead (est= option), fewer will align, but run times will not be overwhelming. Then provide the protein annotations from the related species combined with uniprot (maker can take comma separated lists for the input files). You can use either the program CEGMA from Ian Korf's lab or alternatively maker's protein2genome option to build an initial annotation set to use for training. Then train SNAP, Augustus, and GeneMark (Genemark self trains). For the last run let all 3 predictors run together with protein2genome now turned off. Given that the genome is only 50Mb and you have a lack of alignment evidence, you can probably safely set keep_preds=1 on the second run as the false positive rate is usually quite low for gene dense organisms and you won't get many annotations from maker otherwise without more evidence alignments. Perform your first and second runs in the same location, so maker can automatically reuse the same alignments (the second run is always very fast this way as maker won't have to rerun blast and exonerate). If your organism is a fungi (I'm just guessing because of the small genome size) you can also use this gene prediction parameter resource from Jason Stajich --> https://github.com/hyphaltip/fungi-gene-prediction-params Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 9:16 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks, i now have better insight regard to the cpu cores.. i have other questions...i dont have other info or evidences of my own genome, i only have assembled contigs....recently JGI sequenced a species that closely related to my genome (at genus level), and i have access to the data (protein, est, rna-seq reads,transcript, gene models,gff) 1.I have run maker (MWAS) using diff set of evidences, such as protein and est(JGI) and est(JGI) and uniprot database ..but both run produced diferent no of predicted genes....so my question, what is the best evidences to be used to support my annotation..is it more preferred to use larger dataset such as uniprot rather than using the data from JGI (even it closely related) 2. can i use rna-seq data (from JGI) to be used in maker...ive denovo assembled the rnaseq using clc genomics. Thanks From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 5:01 AM Subject: Re: [maker-devel] Help on error-Repeat masker That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 2 07:57:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:57:08 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AD87E.1010800@ebi.ac.uk> Message-ID: Could you set the TMP= option to a non-NFS mounted location (the default /tmp should work) and let me know if it still fails? You can also try completely deleting the LSalAtl2s.maker.output/mpi_blastdb directory before restarting. Thanks, Carson On 13-04-02 9:09 AM, "Michael Nuhn" wrote: >Hello Carson! > >(Mpi) Maker (2.27) is failing when it runs blast searches. > >It prints out the command it is trying to run. When I try to run this >command manually on the command line, blast terminates with an error, >because it either can't find the input file or it can't find a file >ending in .pin, which I think is a protein index file it expects to be >there. > >I've looked at a few contigs on which maker fails and they were all >rather short contigs. > >Maker works fine, if I > >- run it without mpi or >- run it with mpi, but a maximum of 4 processors. > >(Mpi) Maker used to run fine with 128 processors before this. > >The contigs are sorted descending by size in the genome file. I think >maker has processed the large ones and the problems it is having now >might have something to do with it running on smaller contigs. > > From looking at the error messages I thought at first the index file of >the genome might be corrupted, so I deleted it and let maker rebuild it. >This didn't fix the issue though. I have also set the path for temporary >files manually to make sure maker is not running out of temporary space. > >Any idea how to overcome this?. > >Cheers, >Michael. > >P.S.: A typical error message I'm getting is this: > >--Next Contig-- > >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 > >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAt >l2s8087.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_ >final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAt >l2s8087// >theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.tem >p_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >ERROR: Chunk failed at level:2, tier_type:0 >FAILED CONTIG:LSalAtl2s8083 > >doing blastx repeats >setting up GFF3 output and fasta chunks >doing blastx repeats >re reading repeat masker report. >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSal >Atl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAt >l2s8135.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSal >Atl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAt >l2s8119.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSal >Atl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s8449 >Length: 2187 >Tries: 18!! >#--------------------------------------------------------------------- > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mnuhn at ebi.ac.uk Tue Apr 2 08:38:31 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 15:38:31 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> References: <515AD87E.1010800@ebi.ac.uk> <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> Message-ID: <515AED67.5060906@ebi.ac.uk> On 04/02/2013 02:01 PM, Eleanor Stanley wrote: > what version of Blast are you using? > I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved using BLAST+ 2.2.27 instead I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. I am running it on one mpi instance with 128 processors and it seems to be working now. Thanks! Michael. > Ele > > > On 2 Apr 2013, at 14:09, Michael Nuhn wrote: > >> Hello Carson! >> >> (Mpi) Maker (2.27) is failing when it runs blast searches. >> >> It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. >> >> I've looked at a few contigs on which maker fails and they were all rather short contigs. >> >> Maker works fine, if I >> >> - run it without mpi or >> - run it with mpi, but a maximum of 4 processors. >> >> (Mpi) Maker used to run fine with 128 processors before this. >> >> The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. >> >> From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. >> >> Any idea how to overcome this?. >> >> Cheers, >> Michael. >> >> P.S.: A typical error message I'm getting is this: >> >> --Next Contig-- >> >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 >> -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >> fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// >> theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:LSalAtl2s8083 >> >> doing blastx repeats >> setting up GFF3 output and fasta chunks >> doing blastx repeats >> re reading repeat masker report. >> /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: LSalAtl2s8449 >> Length: 2187 >> Tries: 18!! >> #--------------------------------------------------------------------- >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Apr 2 08:16:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 10:16:44 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AED67.5060906@ebi.ac.uk> Message-ID: Good to know. Thanks, Carson On 13-04-02 10:38 AM, "Michael Nuhn" wrote: > >On 04/02/2013 02:01 PM, Eleanor Stanley wrote: >> what version of Blast are you using? >> I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved >>using BLAST+ 2.2.27 instead > >I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. > >I am running it on one mpi instance with 128 processors and it seems to >be working now. > >Thanks! >Michael. > >> Ele >> >> >> On 2 Apr 2013, at 14:09, Michael Nuhn wrote: >> >>> Hello Carson! >>> >>> (Mpi) Maker (2.27) is failing when it runs blast searches. >>> >>> It prints out the command it is trying to run. When I try to run this >>>command manually on the command line, blast terminates with an error, >>>because it either can't find the input file or it can't find a file >>>ending in .pin, which I think is a protein index file it expects to be >>>there. >>> >>> I've looked at a few contigs on which maker fails and they were all >>>rather short contigs. >>> >>> Maker works fine, if I >>> >>> - run it without mpi or >>> - run it with mpi, but a maximum of 4 processors. >>> >>> (Mpi) Maker used to run fine with 128 processors before this. >>> >>> The contigs are sorted descending by size in the genome file. I think >>>maker has processed the large ones and the problems it is having now >>>might have something to do with it running on smaller contigs. >>> >>> From looking at the error messages I thought at first the index file >>>of the genome might be corrupted, so I deleted it and let maker rebuild >>>it. This didn't fix the issue though. I have also set the path for >>>temporary files manually to make sure maker is not running out of >>>temporary space. >>> >>> Any idea how to overcome this?. >>> >>> Cheers, >>> Michael. >>> >>> P.S.: A typical error message I'm getting is this: >>> >>> --Next Contig-- >>> >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 >>> -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSal >>>Atl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o /n >>> >>>fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >>>r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LS >>>alAtl2s8087// >>> >>>theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.t >>>emp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:LSalAtl2s8083 >>> >>> doing blastx repeats >>> setting up GFF3 output and fasta chunks >>> doing blastx repeats >>> re reading repeat masker report. >>> >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/ >>>LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSal >>>Atl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/ >>>LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSal >>>Atl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/ >>>LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> #--------------------------------------------------------------------- >>> Now retrying the contig!! >>> SeqID: LSalAtl2s8449 >>> Length: 2187 >>> Tries: 18!! >>> #--------------------------------------------------------------------- >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Thu Apr 4 11:29:24 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:29:24 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Thu Apr 4 11:40:48 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:40:48 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: One more thought. If 26,28, and 30 process jobs are failing this could also be because they are not starting across nodes correctly (all end up on the same node). You would then start to run into memory problems and the job would freeze. So validating the proper cross node launch of MPI using the 'hostname' command is still probably the first thing to do. --Carson From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM To: Ram?n Fallon >, "maker-devel at yandell-lab.org" > Subject: Re: second maker2 benchmark, this time, on a cluster Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramonfallon at gmail.com Thu Apr 4 11:03:43 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Thu, 4 Apr 2013 19:03:43 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster Message-ID: Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 48proc.png Type: image/png Size: 24644 bytes Desc: not available URL: From ramonfallon at gmail.com Fri Apr 5 10:00:53 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Fri, 5 Apr 2013 18:00:53 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: References: Message-ID: Thanks for the replies Carson, Our cluster has got busy all of a sudden, so I have to wait a bit to do the hostname test. However, I'm fairly sure (not 100%, mind you) that when the process number is over 24 if will definitely run the extra processes on a separate node, and so do a proper cross node launch. On Thu, Apr 4, 2013 at 7:40 PM, Carson Holt wrote: > One more thought. If 26,28, and 30 process jobs are failing this could > also be because they are not starting across nodes correctly (all end up on > the same node). You would then start to run into memory problems and the > job would freeze. So validating the proper cross node launch of MPI using > the 'hostname' command is still probably the first thing to do. > > --Carson > > > * > * > From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM > To: Ram?n Fallon , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: second maker2 benchmark, this time, on a cluster > > Since you are using 12 core nodes (hyperthreaded cores are virtual ? > you still only have 12 cores of power not 24) and your performance curve > drops off at 12, I'm thinking there is a possibility that the other > processes did not start on a separate node. Try launching the Linux > command 'hostname' the same way you are launching maker. If all 24 lines > of output from hostname have the same host, then maker is only getting > launched on a single node. Then since there are really only 12 cores (not > 24) you would not see any significant performance improvement above 12. So > each process above 12 will reduce the power allocated to remaining > processes. So the difference from 12 to 24 (~25% performance gain) is just > what can be gained from process saturation (not all maker processes are > always at 100% cpu usage because of calls to IO so adding a few more > processes than you have cpu cores sometimes runs a little faster). > > Thanks, > Carson > > > > From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM > To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster > > Hi > > I've done another of my own benchmarks with the Maker2 svn (rev 1017) > code. Last time I went up to 12 processes, this time I aimed for 48. In > contrast to the last 12 core speed check, the target hardware was a > computer cluster, with the Gridengine queue manager. The same data set of > 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence > in one file with different names). > > The nodes in the cluster are (again) HP Proliant SL390 with two Intel > X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running > Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is > that Maker2 was launched from an NFS3 shared home directory, although the > /tmp directories are local to the process running on each node. Nodes are > interconnected via infiniband quadspeed, and because of hyperthreading, can > offer 24 "process-cores" to a job. No overlap between runs was allowed. > > Results were: > #processes time(secs) Megabases/hr > 1 6585.00 2.20 > 2 7137.00 2.03 > 4 2479.00 5.84 > 8 1088.00 13.30 > 10 866.00 16.71 > 12 715.00 20.24 > 14 666.00 21.72 > 16 651.00 22.22 > 18 613.00 23.60 > 24 559.00 25.88 > > Graph is attached to this mail. Some notes: > * A free queue on the gridengine were used so there was no load on these > nodes when run. Two nodes are available on this queue, giving a max of 48 > simualtaneous processes. > * Some processor number (6,20, etc) were deleted because I couldn't > guarantee "No load" conditions during those runs, and I had one or two > anomalies so I'd rather not include them right now. However, I expect them > to be in line with the other results. > * In general the graph shows more consistent performance than last time, > but unfortunately I got incomplete runs after processes=24. Because this is > also the max number of processes per node, it's possible that interconnects > between the nodes had something to do with runs > 24 processes being > inconsistent, however, it's not usually an issue in other programs because > quadspeed (40Gbit/s) is already a fairly fast interconnect). > * Process runs 26,28, and 30 would almost - but not quite - finish (just a > few sequences unfinished), But after this number, the analysis would hardly > get off the ground, seeming to get stuck at Repeatmasker phase. I suppose > this is our main concern at the moment, that we can't speed up beyond 24 > processes. > > Cheers / Ram?n. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 01:25:40 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 16:25:40 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked Message-ID: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 05:20:16 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 13:20:16 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <145c01ce3297$f318eab0$d94ac010$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 05:24:31 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 20:24:31 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 07:54:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 09:54:15 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: It's all CDS, from start to finish. There is never any UTR in the ab initio reference match/match_part alignments. There are two reasons for this. First most ab initio predictors don't produce UTR. Second GFF3 has n is_analysis flag, so it is impossible to separate final gene models from predicted gene models if they are both in the form gene/mRNA/exon/CDS. Augustus can predict UTR, but gien the limitation just mentioned, if I reject the model, I have to trim it before adding it to the reference information. We've actually been in discussion with the apollo development group over this limitation. Original apollo found the same limitation, so they make the same assumption for loading data into the browsing window (gene/mRNA/exon/CDS features always go in the middle annotation track and everything else goes in the reference evidence track). With the new web apollo, we're working on getting the default behavior to allow UTR in the gene predictions by using the SO predicted gene term in the GFF3 (which previously wasn't available for use in apollo and maker). So in summary. Nothing but CDS form now, but will include CDS when available in the sequence in the near future. Thanks, Carson From: "Kang, Yang Jae" Date: Saturday, 6 April, 2013 7:24 AM To: 'Michael Thon' Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 08:37:28 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 16:37:28 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 09:13:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 11:13:16 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What > I want to do is calculating Ka Ks values so that I need coding sequences. Is > there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com ] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the > augustus_masked feature there is no indication of CDS or Exon like maker > features. Is there any way for me to retrieve CDS from augustus_masked? There > were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 12:45:02 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sun, 7 Apr 2013 03:45:02 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Thank you for quick response again! I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and I additionally found the offset value some position after '>' letter. Is that indicate the starting ATG? Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I'm asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Thank you From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" < kangyangjae at gmail.com> wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Sat Apr 6 14:50:29 2013 From: barry.utah at gmail.com (Barry Moore) Date: Sat, 6 Apr 2013 14:50:29 -0600 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: <3B421D2C-590D-4593-8FA5-3CAA10A19FD4@genetics.utah.edu> On Apr 6, 2013, at 12:45 PM, Kang, Yang Jae wrote: > Thank you for quick response again! > > I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and The gene predictors will occasionally produce a transcript with no start/stop codon, set always_complete=1 in maker_opts.clt to get MAKER to try hard to force a start/stop codon. > I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? I didn't really understand that question... > Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I?m asking is that some genes were augustus_masked is a file that contains proteins of all predictions make by Augustus when working on masked sequence. Setting unmask=1 in maker_opts.ctl would instruct MAKER to also run the gene predictors on unmasked sequence and then you'd have a augustus_unmasked file for those predicitions. The non_overlapping_ab_initio files contain proteins predicted by all gene predictors for which MAKER could not find protein/RNA evidence for, so they are unsupported by physical evidence. These unsupported predictions are not promoted by MAKER into annotations in it's final output, but they are included in these files in case you want to work with them. The non_overlapping part of the name means that if multiple gene predictors produce overlapping un support ab initio predictions then MAKER will only output one of them. > redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Yes, the proteins for genes for which MAKER creates annotations will be in both files. > > Thank you > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Sunday, April 07, 2013 12:13 AM > To: Michael Thon; Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). > > --Carson > > From: Michael Thon > Date: Saturday, 6 April, 2013 10:37 AM > To: "Kang, Yang Jae" > Cc: > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... > On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > > > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 15:00:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 17:00:19 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? > Only in the maker.transcripts.fasta will have offsets other than 0, you can > use these to get the transcription offset. All other *.transcript.fasta files > will always have an offset of 0 for the reason previously mentioned. Some > genes will not start with ATG or have stop codons. These are partial models. > Set always_complete=1 to reduce these. Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? > Final selected annotations go in the maker.proteins.fasta and > maker.transcripts.fasta files. Raw unfiltered ab initio prediction from > augustus go in the augustus_masked.proteins.fasta and > augustus_masked.transcripts.fasta file (these are for reference purposes). A > set of non-redundant rejected models go in the > non-overlapping.transcripts.fasta and non-overlapping.proteins.fasta files > (if you are missing a gene you expected to find, look in this file first ? you > can add them back if you find protein domains in them for example). The reason why I?m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. > This is because some of the augustus generated models made it into the final > annotation set. > > Thanks, Carson From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com ] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" > wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhang at genome.wustl.edu Wed Apr 10 10:30:38 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Wed, 10 Apr 2013 11:30:38 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: References: Message-ID: <516593AE.8000909@genome.wustl.edu> Hi All, Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " warning, error in input file format: -3 error reading parameter BRANCH_MAT error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod Error on system: prediction step" and "Error: unknown line format". and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information " warning, error in input file format: -13 5654 dna.fa.good.gb.acc.ph2 first order for ACC 2 Error: unknown line format GC% ntron". any suggestion and comments are appreciated Thanks, Xu From xzhang at genome.wustl.edu Fri Apr 12 06:47:08 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Fri, 12 Apr 2013 07:47:08 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <516593AE.8000909@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> Message-ID: <5168024C.9040808@genome.wustl.edu> I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. Thanks, Xu On 04/10/2013 11:30 AM, xu zhang wrote: > Hi All, > > Does anybody have genemark .mod file for yeast? I tried to create my > own model file using this command" gm_es.pl > S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was > downloaded from ncbi". it failed with this error " > warning, error in input file format: > -3 > error reading parameter BRANCH_MAT > error in model file > /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod > Error on system: prediction step" and "Error: unknown line format". > > and I tried the sample file(pythium_ultimum_scaffolds.fasta) from > Carson. a mod file was created, although it also had some error > information > " warning, error in input file format: > -13 > 5654 dna.fa.good.gb.acc.ph2 > first order for ACC 2 > Error: unknown line format > GC% ntron". > > any suggestion and comments are appreciated > > Thanks, > Xu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Fri Apr 12 09:48:53 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 12 Apr 2013 08:48:53 -0700 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <5168024C.9040808@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> <5168024C.9040808@genome.wustl.edu> Message-ID: <256F7975-9744-4A53-974F-B92B0179A5B2@gmail.com> Did you email the genemark authors? They would be a better source for help. I experienced the same problems with the yeast data to train from and didn't use genemark for those species - it may be that it is expecting more introns and the files for training are empty on some rounds. Jason On Apr 12, 2013, at 5:47 AM, xu zhang wrote: > I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. > > Thanks, > Xu > > On 04/10/2013 11:30 AM, xu zhang wrote: >> Hi All, >> >> Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " >> warning, error in input file format: >> -3 >> error reading parameter BRANCH_MAT >> error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod >> Error on system: prediction step" and "Error: unknown line format". >> >> and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information >> " warning, error in input file format: >> -13 >> 5654 dna.fa.good.gb.acc.ph2 >> first order for ACC 2 >> Error: unknown line format >> GC% ntron". >> >> any suggestion and comments are appreciated >> >> Thanks, >> Xu >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jteckert at gmail.com Sun Apr 14 15:07:33 2013 From: jteckert at gmail.com (James Eckert) Date: Sun, 14 Apr 2013 17:07:33 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf Message-ID: Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 02:16:34 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Tue, 16 Apr 2013 16:16:34 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and_*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 08:20:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 10:20:01 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf In-Reply-To: Message-ID: The input GFF3 file you have the link to only contains one gene? Is that correct. If so then you should only get one gene in the output. The resulting GTF should only have the genes (ignoring all the evidence). To convert for eval use these command lines (note the flags such as -g for gff3_merge so you are only looking at genes and the fast must be included in the file, so no -n flag) gff3_merge -d maker_datastore_index.log -g -o some_file.gff add_utr_start_stop_gff some_file.gff > some_file2.gff maker2eval some_file2.gff Note that all version of MAKER after 2.09 no longer have add_utr_start_stop_gff, the UTR is now always there explicitly, so you go strait from gff3_merge and then use maker2eval_gtf However with that explanation, I have to wonder if EVAL is appropriate for you. EVAL requires a reference annotation set (that is assumed to be 100% perfect) for comparison, and you get a perfect score whenever you call the genes exactly identical to the reference set (which in itself has obvious bias, but we won't get into that). Given that you have no reference set it will not give you anything other than statistics for the distribution of introns and exon sizes. Alternate means for quality given no reference genome are AED (computed for each gene as part of the MAKER run), this is basically a variation of EVAL like statistics run against evidence clusters rather than a reference genome, or you can just use % domain content. See these links for examples of the statistics --> http://www.biomedcentral.com/1471-2105/12/491 http://www.biomedcentral.com/1471-2105/10/67 Also a figure is attached with an example of quality analysis using combined AED, domain content, and comparative orthologs. --Carson From: James Eckert Date: Sunday, 14 April, 2013 5:07 PM To: Subject: [maker-devel] Annotation quality and converting gff3 to gtf Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: B563F1FF-1E85-42E3-B79D-F7F6449F1AE9.png Type: image/png Size: 227568 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 16 09:34:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 11:34:44 -0400 Subject: [maker-devel] maker output In-Reply-To: <1366084495.59030.YahooMailNeo@web164906.mail.bf1.yahoo.com> Message-ID: For AED in general, lower is better, but you have to understand the caveats. With mRNA-seq Nnot all genes may be expressed, not all exons may be captured (mRNA can fold blocking some sequencing reactions), and sometimes the alignment may extend improperly into the intron or even merge into the neighboring gene. Also mRNA-seq captures a lot of things that aren't coding genes. But in general for mRNA-seq, as coverage increases the AED values trend toward 0, and mRNA-seq is the single most informative piece of evidence you can get for annotation (I've seen several very poor genome assemblies with horrible annotations that were saved by mRNA-seq). For mRNa-seq, give MAKER the assembled reads (trinity works well). Also for fungi, the UTR tend to overlap between genes. This can create false merging in the mRNA-seq assemblies (their AED is lower but its a false merge). Use the correct_est_fusion option in the control files to help handle that. I know there are also several members of the MAKER mailing list who have extensive experience using mRNA-seq to annotate fungi who may want to add their two cents. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 15 April, 2013 11:54 PM To: Carson Holt Subject: maker output Hi Carson, have a nice day.. I have a question about the output file from maker, recently i run my longest contigs (100kb) on Maker using rna-seq data from other related species of my genome (same genus)..and ive noticed that i managed to get expressed sequences match annotation compared using just EST and cDNA. Is this due to different size of dataset? as im using larger dataset when im incorporating rna-seq data (assembled transcript combined with cDNA and est) . The value of AED for both prdicted mRNA 0.15 (with rna-seq data) and 0.06(w/o rna-seq data). My question is which one is the most accurate prediction, can i just depends on the value of AED ( the lower the better)? How about the incorporation of rna-seq data in this case,can i conclude that rna-seq improves the annotation (based on the image i attached). Thanks for your time, really appreciate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Apr 16 09:52:07 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Apr 2013 15:52:07 +0000 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: References: Message-ID: Hi Huiquan, 1)The default behavior for Maker is that it will only annotate gene models when there is support from both the evidence (est and protein alignments) and from the ab-initio predictors. How many transcripts did you get from PASA? I expect there are about 254 sequences, which is about how many genes you annotated. If you want to get more gene models, then you need to supply more evidence. For our annotation projects, we often use some derivation of Swiss-prot, which is a hand-curated database of proteins across all kingdoms. 2) The non-overlapping ab-initio file includes ab-initio predictions that didn't overlap any gene models. If augustus and genemark predictions overlap, I think it should include both, but if the one prediction completely covers the other, I think the longer of the two would be included. Does that answer your questions? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of ??? [liuhuiquan at nwsuaf.edu.cn] Sent: Tuesday, April 16, 2013 2:16 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files Hello maker users and developers, I?m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for de novo gene prediction and supplied the PASA assembled transcripts to the est option. When maker2 finished, I used the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? Thanks in advance for any advice. I appreciate your help! best, Huiquan the maker_opts.ctl file: #-----Genome (these are always required) genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=fungi #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=my_ges.mod #GeneMark HMM file augustus_species=my2 #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=20 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=1500 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=200 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=1 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 10:01:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 12:01:27 -0400 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The "non-overlapping" file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 19:49:04 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Wed, 17 Apr 2013 09:49:04 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:23:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:23:54 -0400 Subject: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: correct_est_fusion is not guaranteed to never merge a gene. If you are giving maker imperfect evidence, there is only so much it can do. Also you should be using protein evidence in combination with EST evidence, especially when using the correct_est_fusion option or you are limiting it's effectiveness. MAKER does not work as well on ESTs alone, especially for organisms with few introns as internal logic is relying on the combination of evidence support. --Carson From: ??? Date: Tuesday, 16 April, 2013 9:49 PM To: Subject: Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files Hi Carson and Daniel, Thank you very much for your quick responses! By multiple tries, I have known the reason why only a few genes were annotated by maker. This is due to turn on of the ?correct_est_fusion? option. I got about 8000 transcripts from PASA assembly. Because the gene density of my fungus is very high, many of the assembled transcripts merged adjacent genes even if the trinity and PASA were used with relevant parameter. Maker may not use the merged transcripts as evidence, it the ?correct_est_fusion? option is turn on. However, even though the ?correct_est_fusion? option is used, I also found many genes produced by maker have merged more than one gene. I?m now using the ORFs (trainingSetCandidates.cds) extracted from the transcripts by PASA as the EST evidence supplied to maker. I found most of the extracted ORF can accurate match the gene model predicted by augustus and genemark. This can better resolve the ?merged gene? issues for fungi with high gene density. For the 'non-overlapping' file, if only using genemark, its predictions can be found in the 'non-overlapping' file. Is previously issue due to the gene mode generated by augustus is better that genemark, so only augustus gene was putted into the 'non-overlapping' file? Will the genes predicted only by one program not found in the 'non-overlapping' file? how to get these genes? Thank you Huiquan ???: Carson Holt ????: 2013-04-16 24:01 ???: ??? ;maker-devel at yandell-lab.org ???: Re:Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The 'non-overlapping' file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:16:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:16:18 -0400 Subject: [maker-devel] some strange examples of maker annotation In-Reply-To: Message-ID: maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Thu Apr 18 07:37:07 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Thu, 18 Apr 2013 21:37:07 +0800 Subject: [maker-devel] =?utf-8?q?some_strange_examples_of_maker_annotation?= Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: examples of maker annotation.docx Type: application/octet-stream Size: 1037235 bytes Desc: not available URL: From carsonhh at gmail.com Fri Apr 19 08:55:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Apr 2013 10:55:58 -0400 Subject: [maker-devel] FW: some strange examples of maker annotation In-Reply-To: Message-ID: Just forwarding this to the devel list, so it is archived. --Carson From: Carson Holt Date: Thursday, 18 April, 2013 10:16 AM To: ??? , Subject: Re: some strange examples of maker annotation maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Mon Apr 22 08:09:34 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Mon, 22 Apr 2013 10:09:34 -0400 Subject: [maker-devel] Repeatmasker error? Message-ID: <7EAEB66D-346C-4E9A-B487-B7D5BB352328@hms.harvard.edu> Greetings, Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log files are growing to GB sizes. When looking more carefully, an error seems to be occurring around the Repeatmasker stage: .... Now starting the contig!! -- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats doing blastx of proteins doing blastx of proteins doing blastx of proteins doing blastx repeats collecting blastx repeatmasking processing all repeats ERROR: Can't open seq file: /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_current_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/contig_157//theVoid.contig_157/query.masked.gff.seq No such file or directory at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm line 182 Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line 691 Process::MpiChunk::__ANON__() called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 eval {...} called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib ... I don't seem to have this problem when I fall back to the 2.25b version (though I start having major DBD:SQLite issues). I'm doing this on a cluster, running this under MPI with 50 cores. Any help/suggestions would be appreciated! -Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 22 14:25:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 22 Apr 2013 16:25:06 -0400 Subject: [maker-devel] Repeatmasker error? In-Reply-To: Message-ID: Just forwarding this e-mail chain to the devel list for archiving. --Carson From: "Freeman, Robert M." Date: Monday, 22 April, 2013 4:16 PM To: Carson Holt Subject: Re: [maker-devel] Repeatmasker error? Already looks better ... been checking stderr and it looks error-free so far (knock on wood). Thanks for the help, and sorry for the bother! Oh, should I fall back to the 2.27beta release that you announced on the list?? -b On Apr 22, 2013, at 4:09 PM, Carson Holt wrote: > Let me know you still get problems. Redirecting TMP back locally will also > give a big performance boost. > > Thanks, > Carson > > > From: "Freeman, Robert M." > Date: Monday, 22 April, 2013 4:01 PM > To: Carson Holt > Subject: Re: [maker-devel] Repeatmasker error? > > (chuckle) wow, always something new to learn -- been working with IT systems > for > 20 years, and HPC > 8, and no one has ever explained this to me. > > Have directed TMP to /scratch, which also turns out to be an Isilon-related > mount. Will re-direct all this to /tmp to see if this eliminates the problems. > > -b > > On Apr 22, 2013, at 3:43 PM, Carson Holt wrote: > >> The missing file is part of the GFF3 output, the fasta sequence to be >> specific. Sometimes on NFS (network mounted file systems), they can return >> status 'success' even though the IO event really has not succeeded yet (this >> is called asynchronous IO). The result is a certain speed gain but it also >> means that you can write a file, then immediately try and open it, and the >> system will say that it doesn't exist. On some systems you get weird files >> starting with the name '.nfs000' when these types of errors occur. NFS type >> errors are more common when you use many cpus or other jobs on the cluster >> (not just maker) are using a large amount of IO. To avoid this, MAKER tries >> to do as much work as possible in the directory specified by TMP in the >> control files. By default this is /tmp, and if you set it to something else, >> make sure that the location is locally mounted and not NFS mounted (otherwise >> it can't perform it's purpose of bypassing NFS for certain quick read/write >> operations). The newest version of MAKER unloads exonerate and even most >> gene prediction operations into TMP in addition to other steps that were >> already unloaded there in other versions of the pipeline, and I've been able >> to scale up to > 1500 cpus. >> >> Thanks, >> Carson >> >> >> >> From: "Freeman, Robert M." >> Date: Monday, 22 April, 2013 3:24 PM >> To: Carson Holt >> Subject: Re: [maker-devel] Repeatmasker error? >> >> Thanks, Carson. I'll give this a try. >> >> Randomly? Not sure ... I'll have to go back thru the logs to see if this is >> happening consistently or not. Right now, this log is close to 1 GB in size. >> When I saw it getting this large, I stopped the run as I knew errors were >> getting spewed into the log file. >> >> Thought it might be filesystem as well, but unlikely -- the location for the >> MAKER runs is on our Isilon, and these problems appear only with MAKER. >> >> Other files seem to be present... >> >>> % ls -alt >>> drwxrwx--- 3 rmf1 SYSTEMBIO_klab_genome 236 Apr 21 14:50 .. >>> drwxrwx--- 2 rmf1 SYSTEMBIO_klab_genome 48225 Apr 21 12:38 . >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:34 run.log.child.0 >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.final.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.raw.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 388512 Apr 21 12:34 evidence_0.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7269 Apr 21 12:34 >>> contig_157.102049-103030.gi%7C145478069%7Cref%7CXP_001425057% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 4561 Apr 21 12:34 >>> contig_157.101950-103090.gi%7C145514179%7Cref%7CXP_001443000% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7088 Apr 21 12:34 >>> contig_157.101956-103435.gi%7C145505343%7Cref%7CXP_001438638% >>> 2E1%7C.p_exonerate >>> .... >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7184469 Apr 21 12:33 >>> contig_157.0.sequences_r5%2Efasta.blastx >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:25 >>> query.masked.gff.def >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9885 Apr 21 12:25 >>> query.masked.gff.ann >>> -rw-r--r-- 1 rmf1 SYSTEMBIO_klab_genome 49152 Apr 21 12:25 >>> query.masked.fasta.index >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:25 >>> query.masked.fasta >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9991 Apr 21 12:25 >>> query.masked.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 28002 Apr 21 12:25 >>> contig_157.0.te_proteins%2Efasta.repeatrunner >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 3589 Apr 21 12:24 >>> contig_157.0.all.rb.out >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:23 query.fasta >>> >> It's just that the one output file MAKER is looking for isn't there. >> >> I guess the other question I should ask: as there are exonerate sequences >> there, does it appear that the pipeline is running OK, and just ignore these >> errors (somehow)? >> >> -b >> >> On Apr 22, 2013, at 12:39 PM, Carson Holt wrote: >> >>> Could you give the devel version a try to see if it experiences the same >>> failure, as it's easier to debug off of the most current code. >>> >>> Type this on the command line to download--> >>> ************************** >>> >>> user: ******* >>> password: ******* >>> >>> The error appears to be filesystem related though. Does it appear to happen >>> randomly? >>> >>> Thanks, >>> Carson >>> >>> From: "Freeman, Robert M." >>> Date: Monday, 22 April, 2013 10:09 AM >>> To: "maker-devel at yandell-lab.org" >>> Subject: [maker-devel] Repeatmasker error? >>> >>> Greetings, >>> >>> Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log >>> files are growing to GB sizes. When looking more carefully, an error seems >>> to be occurring around the Repeatmasker stage: >>> >>>> .... >>>> Now starting the contig!! >>>> -- >>>> >>>> setting up GFF3 output and fasta chunks >>>> doing repeat masking >>>> doing blastx repeats >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx repeats >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> ERROR: Can't open seq file: >>>> /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_cur >>>> rent_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/co >>>> ntig_157//theVoid.contig_157/query.masked.gff.seq >>>> No such file or directory >>>> >>>> at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm >>>> line 182 >>>> Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') >>>> called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line >>>> 691 >>>> Process::MpiChunk::__ANON__() called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 >>>> eval {...} called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 >>>> Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib >>>> ... >>> >>> I don't seem to have this problem when I fall back to the 2.25b version >>> (though I start having major DBD:SQLite issues). >>> >>> I'm doing this on a cluster, running this under MPI with 50 cores. >>> >>> Any help/suggestions would be appreciated! >>> >>> -Bob >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m >>> aker-devel_yandell-lab.org >> >> ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Mon Apr 29 09:58:09 2013 From: ejr at stowers.org (Ross, Eric) Date: Mon, 29 Apr 2013 15:58:09 +0000 Subject: [maker-devel] repeat statistics Message-ID: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From barry.moore at genetics.utah.edu Mon Apr 29 11:59:14 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 29 Apr 2013 11:59:14 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Mon Apr 29 16:49:12 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 29 Apr 2013 15:49:12 -0700 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the > web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue Apr 30 00:14:44 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 30 Apr 2013 00:14:44 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Correct. And web page is now updated as well. B On Apr 29, 2013, at 4:49 PM, Jason Stajich wrote: > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > >> Does anyone have a good tool for yanking repeat statistics out of MAKER >> gff files? >> >> SOBA can give some basic stats, but it doesn't play well with my giant >> files and I haven't figured out a way to run it locally. >> >> For that matter does anyone have a script that will calculate SOBA like >> stats locally? I'd rather avoid writing one myself if something else is >> out there. >> >> Thanks, >> >> Eric >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 08:50:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 10:50:58 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364760124.37890.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Apr 1 10:27:23 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 1 Apr 2013 16:27:23 +0000 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: References: Message-ID: Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 10:38:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 12:38:18 -0400 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: Message-ID: I'm thinking the same thing. Reviewing how I parse GeneMarks output, I just use their start and end coordinates (no changes). Over the weekend I altered the GeneMark parser to walk the gene start away from the supposed origin so as not to let this happen. In your E. coli test case since you have multiple contigs for what is likely a single circular genome, this would be the correct behavior as you don't want to treat each contig as an independent circular chromosome. I should probably add an is_circular option to the control files so users can select for this. I've updated the maker subversion repository so you can do an 'svn update' (I believe you are using the devel version of MAEKR correct?) Thanks, Carson From: Daniel Ence Date: Monday, 1 April, 2013 12:27 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Why are some start positions minus in the gff result? Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 13:59:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 15:59:18 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364846015.96057.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 14:29:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 16:29:40 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364847674.21064.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 15:47:38 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 17:47:38 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: Message-ID: That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Tue Apr 2 07:09:18 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 14:09:18 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs Message-ID: <515AD87E.1010800@ebi.ac.uk> Hello Carson! (Mpi) Maker (2.27) is failing when it runs blast searches. It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. I've looked at a few contigs on which maker fails and they were all rather short contigs. Maker works fine, if I - run it without mpi or - run it with mpi, but a maximum of 4 processors. (Mpi) Maker used to run fine with 128 processors before this. The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. Any idea how to overcome this?. Cheers, Michael. P.S.: A typical error message I'm getting is this: --Next Contig-- [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:LSalAtl2s8083 doing blastx repeats setting up GFF3 output and fasta chunks doing blastx repeats re reading repeat masker report. /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s8449 Length: 2187 Tries: 18!! #--------------------------------------------------------------------- From carsonhh at gmail.com Tue Apr 2 07:15:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:15:28 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364865389.66083.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: The best evidence is from mRNAseq or ESTs of the same species. ESTs and mRNAseq from related species can be used, but if protein annotations are available use those instead. This is because cross species nucleotide alinements must be translated in all 6 reading frames (3 for the query and 3 for the subject), which would basically make run times increase by 6 fold. You can try giving the cross species alignments to maker as if they were from the same species instead (est= option), fewer will align, but run times will not be overwhelming. Then provide the protein annotations from the related species combined with uniprot (maker can take comma separated lists for the input files). You can use either the program CEGMA from Ian Korf's lab or alternatively maker's protein2genome option to build an initial annotation set to use for training. Then train SNAP, Augustus, and GeneMark (Genemark self trains). For the last run let all 3 predictors run together with protein2genome now turned off. Given that the genome is only 50Mb and you have a lack of alignment evidence, you can probably safely set keep_preds=1 on the second run as the false positive rate is usually quite low for gene dense organisms and you won't get many annotations from maker otherwise without more evidence alignments. Perform your first and second runs in the same location, so maker can automatically reuse the same alignments (the second run is always very fast this way as maker won't have to rerun blast and exonerate). If your organism is a fungi (I'm just guessing because of the small genome size) you can also use this gene prediction parameter resource from Jason Stajich --> https://github.com/hyphaltip/fungi-gene-prediction-params Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 9:16 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks, i now have better insight regard to the cpu cores.. i have other questions...i dont have other info or evidences of my own genome, i only have assembled contigs....recently JGI sequenced a species that closely related to my genome (at genus level), and i have access to the data (protein, est, rna-seq reads,transcript, gene models,gff) 1.I have run maker (MWAS) using diff set of evidences, such as protein and est(JGI) and est(JGI) and uniprot database ..but both run produced diferent no of predicted genes....so my question, what is the best evidences to be used to support my annotation..is it more preferred to use larger dataset such as uniprot rather than using the data from JGI (even it closely related) 2. can i use rna-seq data (from JGI) to be used in maker...ive denovo assembled the rnaseq using clc genomics. Thanks From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 5:01 AM Subject: Re: [maker-devel] Help on error-Repeat masker That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 2 07:57:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:57:08 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AD87E.1010800@ebi.ac.uk> Message-ID: Could you set the TMP= option to a non-NFS mounted location (the default /tmp should work) and let me know if it still fails? You can also try completely deleting the LSalAtl2s.maker.output/mpi_blastdb directory before restarting. Thanks, Carson On 13-04-02 9:09 AM, "Michael Nuhn" wrote: >Hello Carson! > >(Mpi) Maker (2.27) is failing when it runs blast searches. > >It prints out the command it is trying to run. When I try to run this >command manually on the command line, blast terminates with an error, >because it either can't find the input file or it can't find a file >ending in .pin, which I think is a protein index file it expects to be >there. > >I've looked at a few contigs on which maker fails and they were all >rather short contigs. > >Maker works fine, if I > >- run it without mpi or >- run it with mpi, but a maximum of 4 processors. > >(Mpi) Maker used to run fine with 128 processors before this. > >The contigs are sorted descending by size in the genome file. I think >maker has processed the large ones and the problems it is having now >might have something to do with it running on smaller contigs. > > From looking at the error messages I thought at first the index file of >the genome might be corrupted, so I deleted it and let maker rebuild it. >This didn't fix the issue though. I have also set the path for temporary >files manually to make sure maker is not running out of temporary space. > >Any idea how to overcome this?. > >Cheers, >Michael. > >P.S.: A typical error message I'm getting is this: > >--Next Contig-- > >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 > >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAt >l2s8087.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_ >final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAt >l2s8087// >theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.tem >p_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >ERROR: Chunk failed at level:2, tier_type:0 >FAILED CONTIG:LSalAtl2s8083 > >doing blastx repeats >setting up GFF3 output and fasta chunks >doing blastx repeats >re reading repeat masker report. >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSal >Atl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAt >l2s8135.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSal >Atl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAt >l2s8119.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSal >Atl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s8449 >Length: 2187 >Tries: 18!! >#--------------------------------------------------------------------- > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mnuhn at ebi.ac.uk Tue Apr 2 08:38:31 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 15:38:31 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> References: <515AD87E.1010800@ebi.ac.uk> <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> Message-ID: <515AED67.5060906@ebi.ac.uk> On 04/02/2013 02:01 PM, Eleanor Stanley wrote: > what version of Blast are you using? > I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved using BLAST+ 2.2.27 instead I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. I am running it on one mpi instance with 128 processors and it seems to be working now. Thanks! Michael. > Ele > > > On 2 Apr 2013, at 14:09, Michael Nuhn wrote: > >> Hello Carson! >> >> (Mpi) Maker (2.27) is failing when it runs blast searches. >> >> It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. >> >> I've looked at a few contigs on which maker fails and they were all rather short contigs. >> >> Maker works fine, if I >> >> - run it without mpi or >> - run it with mpi, but a maximum of 4 processors. >> >> (Mpi) Maker used to run fine with 128 processors before this. >> >> The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. >> >> From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. >> >> Any idea how to overcome this?. >> >> Cheers, >> Michael. >> >> P.S.: A typical error message I'm getting is this: >> >> --Next Contig-- >> >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 >> -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >> fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// >> theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:LSalAtl2s8083 >> >> doing blastx repeats >> setting up GFF3 output and fasta chunks >> doing blastx repeats >> re reading repeat masker report. >> /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: LSalAtl2s8449 >> Length: 2187 >> Tries: 18!! >> #--------------------------------------------------------------------- >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Apr 2 08:16:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 10:16:44 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AED67.5060906@ebi.ac.uk> Message-ID: Good to know. Thanks, Carson On 13-04-02 10:38 AM, "Michael Nuhn" wrote: > >On 04/02/2013 02:01 PM, Eleanor Stanley wrote: >> what version of Blast are you using? >> I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved >>using BLAST+ 2.2.27 instead > >I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. > >I am running it on one mpi instance with 128 processors and it seems to >be working now. > >Thanks! >Michael. > >> Ele >> >> >> On 2 Apr 2013, at 14:09, Michael Nuhn wrote: >> >>> Hello Carson! >>> >>> (Mpi) Maker (2.27) is failing when it runs blast searches. >>> >>> It prints out the command it is trying to run. When I try to run this >>>command manually on the command line, blast terminates with an error, >>>because it either can't find the input file or it can't find a file >>>ending in .pin, which I think is a protein index file it expects to be >>>there. >>> >>> I've looked at a few contigs on which maker fails and they were all >>>rather short contigs. >>> >>> Maker works fine, if I >>> >>> - run it without mpi or >>> - run it with mpi, but a maximum of 4 processors. >>> >>> (Mpi) Maker used to run fine with 128 processors before this. >>> >>> The contigs are sorted descending by size in the genome file. I think >>>maker has processed the large ones and the problems it is having now >>>might have something to do with it running on smaller contigs. >>> >>> From looking at the error messages I thought at first the index file >>>of the genome might be corrupted, so I deleted it and let maker rebuild >>>it. This didn't fix the issue though. I have also set the path for >>>temporary files manually to make sure maker is not running out of >>>temporary space. >>> >>> Any idea how to overcome this?. >>> >>> Cheers, >>> Michael. >>> >>> P.S.: A typical error message I'm getting is this: >>> >>> --Next Contig-- >>> >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 >>> -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSal >>>Atl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o /n >>> >>>fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >>>r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LS >>>alAtl2s8087// >>> >>>theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.t >>>emp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:LSalAtl2s8083 >>> >>> doing blastx repeats >>> setting up GFF3 output and fasta chunks >>> doing blastx repeats >>> re reading repeat masker report. >>> >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/ >>>LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSal >>>Atl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/ >>>LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSal >>>Atl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/ >>>LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> #--------------------------------------------------------------------- >>> Now retrying the contig!! >>> SeqID: LSalAtl2s8449 >>> Length: 2187 >>> Tries: 18!! >>> #--------------------------------------------------------------------- >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Thu Apr 4 11:29:24 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:29:24 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Thu Apr 4 11:40:48 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:40:48 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: One more thought. If 26,28, and 30 process jobs are failing this could also be because they are not starting across nodes correctly (all end up on the same node). You would then start to run into memory problems and the job would freeze. So validating the proper cross node launch of MPI using the 'hostname' command is still probably the first thing to do. --Carson From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM To: Ram?n Fallon >, "maker-devel at yandell-lab.org" > Subject: Re: second maker2 benchmark, this time, on a cluster Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramonfallon at gmail.com Thu Apr 4 11:03:43 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Thu, 4 Apr 2013 19:03:43 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster Message-ID: Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 48proc.png Type: image/png Size: 24644 bytes Desc: not available URL: From ramonfallon at gmail.com Fri Apr 5 10:00:53 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Fri, 5 Apr 2013 18:00:53 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: References: Message-ID: Thanks for the replies Carson, Our cluster has got busy all of a sudden, so I have to wait a bit to do the hostname test. However, I'm fairly sure (not 100%, mind you) that when the process number is over 24 if will definitely run the extra processes on a separate node, and so do a proper cross node launch. On Thu, Apr 4, 2013 at 7:40 PM, Carson Holt wrote: > One more thought. If 26,28, and 30 process jobs are failing this could > also be because they are not starting across nodes correctly (all end up on > the same node). You would then start to run into memory problems and the > job would freeze. So validating the proper cross node launch of MPI using > the 'hostname' command is still probably the first thing to do. > > --Carson > > > * > * > From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM > To: Ram?n Fallon , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: second maker2 benchmark, this time, on a cluster > > Since you are using 12 core nodes (hyperthreaded cores are virtual ? > you still only have 12 cores of power not 24) and your performance curve > drops off at 12, I'm thinking there is a possibility that the other > processes did not start on a separate node. Try launching the Linux > command 'hostname' the same way you are launching maker. If all 24 lines > of output from hostname have the same host, then maker is only getting > launched on a single node. Then since there are really only 12 cores (not > 24) you would not see any significant performance improvement above 12. So > each process above 12 will reduce the power allocated to remaining > processes. So the difference from 12 to 24 (~25% performance gain) is just > what can be gained from process saturation (not all maker processes are > always at 100% cpu usage because of calls to IO so adding a few more > processes than you have cpu cores sometimes runs a little faster). > > Thanks, > Carson > > > > From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM > To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster > > Hi > > I've done another of my own benchmarks with the Maker2 svn (rev 1017) > code. Last time I went up to 12 processes, this time I aimed for 48. In > contrast to the last 12 core speed check, the target hardware was a > computer cluster, with the Gridengine queue manager. The same data set of > 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence > in one file with different names). > > The nodes in the cluster are (again) HP Proliant SL390 with two Intel > X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running > Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is > that Maker2 was launched from an NFS3 shared home directory, although the > /tmp directories are local to the process running on each node. Nodes are > interconnected via infiniband quadspeed, and because of hyperthreading, can > offer 24 "process-cores" to a job. No overlap between runs was allowed. > > Results were: > #processes time(secs) Megabases/hr > 1 6585.00 2.20 > 2 7137.00 2.03 > 4 2479.00 5.84 > 8 1088.00 13.30 > 10 866.00 16.71 > 12 715.00 20.24 > 14 666.00 21.72 > 16 651.00 22.22 > 18 613.00 23.60 > 24 559.00 25.88 > > Graph is attached to this mail. Some notes: > * A free queue on the gridengine were used so there was no load on these > nodes when run. Two nodes are available on this queue, giving a max of 48 > simualtaneous processes. > * Some processor number (6,20, etc) were deleted because I couldn't > guarantee "No load" conditions during those runs, and I had one or two > anomalies so I'd rather not include them right now. However, I expect them > to be in line with the other results. > * In general the graph shows more consistent performance than last time, > but unfortunately I got incomplete runs after processes=24. Because this is > also the max number of processes per node, it's possible that interconnects > between the nodes had something to do with runs > 24 processes being > inconsistent, however, it's not usually an issue in other programs because > quadspeed (40Gbit/s) is already a fairly fast interconnect). > * Process runs 26,28, and 30 would almost - but not quite - finish (just a > few sequences unfinished), But after this number, the analysis would hardly > get off the ground, seeming to get stuck at Repeatmasker phase. I suppose > this is our main concern at the moment, that we can't speed up beyond 24 > processes. > > Cheers / Ram?n. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 01:25:40 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 16:25:40 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked Message-ID: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 05:20:16 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 13:20:16 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <145c01ce3297$f318eab0$d94ac010$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 05:24:31 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 20:24:31 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 07:54:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 09:54:15 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: It's all CDS, from start to finish. There is never any UTR in the ab initio reference match/match_part alignments. There are two reasons for this. First most ab initio predictors don't produce UTR. Second GFF3 has n is_analysis flag, so it is impossible to separate final gene models from predicted gene models if they are both in the form gene/mRNA/exon/CDS. Augustus can predict UTR, but gien the limitation just mentioned, if I reject the model, I have to trim it before adding it to the reference information. We've actually been in discussion with the apollo development group over this limitation. Original apollo found the same limitation, so they make the same assumption for loading data into the browsing window (gene/mRNA/exon/CDS features always go in the middle annotation track and everything else goes in the reference evidence track). With the new web apollo, we're working on getting the default behavior to allow UTR in the gene predictions by using the SO predicted gene term in the GFF3 (which previously wasn't available for use in apollo and maker). So in summary. Nothing but CDS form now, but will include CDS when available in the sequence in the near future. Thanks, Carson From: "Kang, Yang Jae" Date: Saturday, 6 April, 2013 7:24 AM To: 'Michael Thon' Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 08:37:28 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 16:37:28 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 09:13:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 11:13:16 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What > I want to do is calculating Ka Ks values so that I need coding sequences. Is > there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com ] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the > augustus_masked feature there is no indication of CDS or Exon like maker > features. Is there any way for me to retrieve CDS from augustus_masked? There > were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 12:45:02 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sun, 7 Apr 2013 03:45:02 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Thank you for quick response again! I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and I additionally found the offset value some position after '>' letter. Is that indicate the starting ATG? Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I'm asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Thank you From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" < kangyangjae at gmail.com> wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Sat Apr 6 14:50:29 2013 From: barry.utah at gmail.com (Barry Moore) Date: Sat, 6 Apr 2013 14:50:29 -0600 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: <3B421D2C-590D-4593-8FA5-3CAA10A19FD4@genetics.utah.edu> On Apr 6, 2013, at 12:45 PM, Kang, Yang Jae wrote: > Thank you for quick response again! > > I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and The gene predictors will occasionally produce a transcript with no start/stop codon, set always_complete=1 in maker_opts.clt to get MAKER to try hard to force a start/stop codon. > I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? I didn't really understand that question... > Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I?m asking is that some genes were augustus_masked is a file that contains proteins of all predictions make by Augustus when working on masked sequence. Setting unmask=1 in maker_opts.ctl would instruct MAKER to also run the gene predictors on unmasked sequence and then you'd have a augustus_unmasked file for those predicitions. The non_overlapping_ab_initio files contain proteins predicted by all gene predictors for which MAKER could not find protein/RNA evidence for, so they are unsupported by physical evidence. These unsupported predictions are not promoted by MAKER into annotations in it's final output, but they are included in these files in case you want to work with them. The non_overlapping part of the name means that if multiple gene predictors produce overlapping un support ab initio predictions then MAKER will only output one of them. > redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Yes, the proteins for genes for which MAKER creates annotations will be in both files. > > Thank you > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Sunday, April 07, 2013 12:13 AM > To: Michael Thon; Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). > > --Carson > > From: Michael Thon > Date: Saturday, 6 April, 2013 10:37 AM > To: "Kang, Yang Jae" > Cc: > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... > On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > > > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 15:00:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 17:00:19 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? > Only in the maker.transcripts.fasta will have offsets other than 0, you can > use these to get the transcription offset. All other *.transcript.fasta files > will always have an offset of 0 for the reason previously mentioned. Some > genes will not start with ATG or have stop codons. These are partial models. > Set always_complete=1 to reduce these. Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? > Final selected annotations go in the maker.proteins.fasta and > maker.transcripts.fasta files. Raw unfiltered ab initio prediction from > augustus go in the augustus_masked.proteins.fasta and > augustus_masked.transcripts.fasta file (these are for reference purposes). A > set of non-redundant rejected models go in the > non-overlapping.transcripts.fasta and non-overlapping.proteins.fasta files > (if you are missing a gene you expected to find, look in this file first ? you > can add them back if you find protein domains in them for example). The reason why I?m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. > This is because some of the augustus generated models made it into the final > annotation set. > > Thanks, Carson From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com ] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" > wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhang at genome.wustl.edu Wed Apr 10 10:30:38 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Wed, 10 Apr 2013 11:30:38 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: References: Message-ID: <516593AE.8000909@genome.wustl.edu> Hi All, Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " warning, error in input file format: -3 error reading parameter BRANCH_MAT error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod Error on system: prediction step" and "Error: unknown line format". and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information " warning, error in input file format: -13 5654 dna.fa.good.gb.acc.ph2 first order for ACC 2 Error: unknown line format GC% ntron". any suggestion and comments are appreciated Thanks, Xu From xzhang at genome.wustl.edu Fri Apr 12 06:47:08 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Fri, 12 Apr 2013 07:47:08 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <516593AE.8000909@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> Message-ID: <5168024C.9040808@genome.wustl.edu> I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. Thanks, Xu On 04/10/2013 11:30 AM, xu zhang wrote: > Hi All, > > Does anybody have genemark .mod file for yeast? I tried to create my > own model file using this command" gm_es.pl > S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was > downloaded from ncbi". it failed with this error " > warning, error in input file format: > -3 > error reading parameter BRANCH_MAT > error in model file > /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod > Error on system: prediction step" and "Error: unknown line format". > > and I tried the sample file(pythium_ultimum_scaffolds.fasta) from > Carson. a mod file was created, although it also had some error > information > " warning, error in input file format: > -13 > 5654 dna.fa.good.gb.acc.ph2 > first order for ACC 2 > Error: unknown line format > GC% ntron". > > any suggestion and comments are appreciated > > Thanks, > Xu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Fri Apr 12 09:48:53 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 12 Apr 2013 08:48:53 -0700 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <5168024C.9040808@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> <5168024C.9040808@genome.wustl.edu> Message-ID: <256F7975-9744-4A53-974F-B92B0179A5B2@gmail.com> Did you email the genemark authors? They would be a better source for help. I experienced the same problems with the yeast data to train from and didn't use genemark for those species - it may be that it is expecting more introns and the files for training are empty on some rounds. Jason On Apr 12, 2013, at 5:47 AM, xu zhang wrote: > I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. > > Thanks, > Xu > > On 04/10/2013 11:30 AM, xu zhang wrote: >> Hi All, >> >> Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " >> warning, error in input file format: >> -3 >> error reading parameter BRANCH_MAT >> error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod >> Error on system: prediction step" and "Error: unknown line format". >> >> and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information >> " warning, error in input file format: >> -13 >> 5654 dna.fa.good.gb.acc.ph2 >> first order for ACC 2 >> Error: unknown line format >> GC% ntron". >> >> any suggestion and comments are appreciated >> >> Thanks, >> Xu >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jteckert at gmail.com Sun Apr 14 15:07:33 2013 From: jteckert at gmail.com (James Eckert) Date: Sun, 14 Apr 2013 17:07:33 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf Message-ID: Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 02:16:34 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Tue, 16 Apr 2013 16:16:34 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and_*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 08:20:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 10:20:01 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf In-Reply-To: Message-ID: The input GFF3 file you have the link to only contains one gene? Is that correct. If so then you should only get one gene in the output. The resulting GTF should only have the genes (ignoring all the evidence). To convert for eval use these command lines (note the flags such as -g for gff3_merge so you are only looking at genes and the fast must be included in the file, so no -n flag) gff3_merge -d maker_datastore_index.log -g -o some_file.gff add_utr_start_stop_gff some_file.gff > some_file2.gff maker2eval some_file2.gff Note that all version of MAKER after 2.09 no longer have add_utr_start_stop_gff, the UTR is now always there explicitly, so you go strait from gff3_merge and then use maker2eval_gtf However with that explanation, I have to wonder if EVAL is appropriate for you. EVAL requires a reference annotation set (that is assumed to be 100% perfect) for comparison, and you get a perfect score whenever you call the genes exactly identical to the reference set (which in itself has obvious bias, but we won't get into that). Given that you have no reference set it will not give you anything other than statistics for the distribution of introns and exon sizes. Alternate means for quality given no reference genome are AED (computed for each gene as part of the MAKER run), this is basically a variation of EVAL like statistics run against evidence clusters rather than a reference genome, or you can just use % domain content. See these links for examples of the statistics --> http://www.biomedcentral.com/1471-2105/12/491 http://www.biomedcentral.com/1471-2105/10/67 Also a figure is attached with an example of quality analysis using combined AED, domain content, and comparative orthologs. --Carson From: James Eckert Date: Sunday, 14 April, 2013 5:07 PM To: Subject: [maker-devel] Annotation quality and converting gff3 to gtf Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: B563F1FF-1E85-42E3-B79D-F7F6449F1AE9.png Type: image/png Size: 227568 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 16 09:34:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 11:34:44 -0400 Subject: [maker-devel] maker output In-Reply-To: <1366084495.59030.YahooMailNeo@web164906.mail.bf1.yahoo.com> Message-ID: For AED in general, lower is better, but you have to understand the caveats. With mRNA-seq Nnot all genes may be expressed, not all exons may be captured (mRNA can fold blocking some sequencing reactions), and sometimes the alignment may extend improperly into the intron or even merge into the neighboring gene. Also mRNA-seq captures a lot of things that aren't coding genes. But in general for mRNA-seq, as coverage increases the AED values trend toward 0, and mRNA-seq is the single most informative piece of evidence you can get for annotation (I've seen several very poor genome assemblies with horrible annotations that were saved by mRNA-seq). For mRNa-seq, give MAKER the assembled reads (trinity works well). Also for fungi, the UTR tend to overlap between genes. This can create false merging in the mRNA-seq assemblies (their AED is lower but its a false merge). Use the correct_est_fusion option in the control files to help handle that. I know there are also several members of the MAKER mailing list who have extensive experience using mRNA-seq to annotate fungi who may want to add their two cents. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 15 April, 2013 11:54 PM To: Carson Holt Subject: maker output Hi Carson, have a nice day.. I have a question about the output file from maker, recently i run my longest contigs (100kb) on Maker using rna-seq data from other related species of my genome (same genus)..and ive noticed that i managed to get expressed sequences match annotation compared using just EST and cDNA. Is this due to different size of dataset? as im using larger dataset when im incorporating rna-seq data (assembled transcript combined with cDNA and est) . The value of AED for both prdicted mRNA 0.15 (with rna-seq data) and 0.06(w/o rna-seq data). My question is which one is the most accurate prediction, can i just depends on the value of AED ( the lower the better)? How about the incorporation of rna-seq data in this case,can i conclude that rna-seq improves the annotation (based on the image i attached). Thanks for your time, really appreciate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Apr 16 09:52:07 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Apr 2013 15:52:07 +0000 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: References: Message-ID: Hi Huiquan, 1)The default behavior for Maker is that it will only annotate gene models when there is support from both the evidence (est and protein alignments) and from the ab-initio predictors. How many transcripts did you get from PASA? I expect there are about 254 sequences, which is about how many genes you annotated. If you want to get more gene models, then you need to supply more evidence. For our annotation projects, we often use some derivation of Swiss-prot, which is a hand-curated database of proteins across all kingdoms. 2) The non-overlapping ab-initio file includes ab-initio predictions that didn't overlap any gene models. If augustus and genemark predictions overlap, I think it should include both, but if the one prediction completely covers the other, I think the longer of the two would be included. Does that answer your questions? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of ??? [liuhuiquan at nwsuaf.edu.cn] Sent: Tuesday, April 16, 2013 2:16 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files Hello maker users and developers, I?m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for de novo gene prediction and supplied the PASA assembled transcripts to the est option. When maker2 finished, I used the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? Thanks in advance for any advice. I appreciate your help! best, Huiquan the maker_opts.ctl file: #-----Genome (these are always required) genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=fungi #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=my_ges.mod #GeneMark HMM file augustus_species=my2 #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=20 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=1500 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=200 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=1 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 10:01:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 12:01:27 -0400 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The "non-overlapping" file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 19:49:04 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Wed, 17 Apr 2013 09:49:04 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:23:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:23:54 -0400 Subject: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: correct_est_fusion is not guaranteed to never merge a gene. If you are giving maker imperfect evidence, there is only so much it can do. Also you should be using protein evidence in combination with EST evidence, especially when using the correct_est_fusion option or you are limiting it's effectiveness. MAKER does not work as well on ESTs alone, especially for organisms with few introns as internal logic is relying on the combination of evidence support. --Carson From: ??? Date: Tuesday, 16 April, 2013 9:49 PM To: Subject: Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files Hi Carson and Daniel, Thank you very much for your quick responses! By multiple tries, I have known the reason why only a few genes were annotated by maker. This is due to turn on of the ?correct_est_fusion? option. I got about 8000 transcripts from PASA assembly. Because the gene density of my fungus is very high, many of the assembled transcripts merged adjacent genes even if the trinity and PASA were used with relevant parameter. Maker may not use the merged transcripts as evidence, it the ?correct_est_fusion? option is turn on. However, even though the ?correct_est_fusion? option is used, I also found many genes produced by maker have merged more than one gene. I?m now using the ORFs (trainingSetCandidates.cds) extracted from the transcripts by PASA as the EST evidence supplied to maker. I found most of the extracted ORF can accurate match the gene model predicted by augustus and genemark. This can better resolve the ?merged gene? issues for fungi with high gene density. For the 'non-overlapping' file, if only using genemark, its predictions can be found in the 'non-overlapping' file. Is previously issue due to the gene mode generated by augustus is better that genemark, so only augustus gene was putted into the 'non-overlapping' file? Will the genes predicted only by one program not found in the 'non-overlapping' file? how to get these genes? Thank you Huiquan ???: Carson Holt ????: 2013-04-16 24:01 ???: ??? ;maker-devel at yandell-lab.org ???: Re:Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The 'non-overlapping' file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:16:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:16:18 -0400 Subject: [maker-devel] some strange examples of maker annotation In-Reply-To: Message-ID: maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Thu Apr 18 07:37:07 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Thu, 18 Apr 2013 21:37:07 +0800 Subject: [maker-devel] =?utf-8?q?some_strange_examples_of_maker_annotation?= Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: examples of maker annotation.docx Type: application/octet-stream Size: 1037235 bytes Desc: not available URL: From carsonhh at gmail.com Fri Apr 19 08:55:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Apr 2013 10:55:58 -0400 Subject: [maker-devel] FW: some strange examples of maker annotation In-Reply-To: Message-ID: Just forwarding this to the devel list, so it is archived. --Carson From: Carson Holt Date: Thursday, 18 April, 2013 10:16 AM To: ??? , Subject: Re: some strange examples of maker annotation maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Mon Apr 22 08:09:34 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Mon, 22 Apr 2013 10:09:34 -0400 Subject: [maker-devel] Repeatmasker error? Message-ID: <7EAEB66D-346C-4E9A-B487-B7D5BB352328@hms.harvard.edu> Greetings, Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log files are growing to GB sizes. When looking more carefully, an error seems to be occurring around the Repeatmasker stage: .... Now starting the contig!! -- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats doing blastx of proteins doing blastx of proteins doing blastx of proteins doing blastx repeats collecting blastx repeatmasking processing all repeats ERROR: Can't open seq file: /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_current_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/contig_157//theVoid.contig_157/query.masked.gff.seq No such file or directory at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm line 182 Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line 691 Process::MpiChunk::__ANON__() called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 eval {...} called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib ... I don't seem to have this problem when I fall back to the 2.25b version (though I start having major DBD:SQLite issues). I'm doing this on a cluster, running this under MPI with 50 cores. Any help/suggestions would be appreciated! -Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 22 14:25:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 22 Apr 2013 16:25:06 -0400 Subject: [maker-devel] Repeatmasker error? In-Reply-To: Message-ID: Just forwarding this e-mail chain to the devel list for archiving. --Carson From: "Freeman, Robert M." Date: Monday, 22 April, 2013 4:16 PM To: Carson Holt Subject: Re: [maker-devel] Repeatmasker error? Already looks better ... been checking stderr and it looks error-free so far (knock on wood). Thanks for the help, and sorry for the bother! Oh, should I fall back to the 2.27beta release that you announced on the list?? -b On Apr 22, 2013, at 4:09 PM, Carson Holt wrote: > Let me know you still get problems. Redirecting TMP back locally will also > give a big performance boost. > > Thanks, > Carson > > > From: "Freeman, Robert M." > Date: Monday, 22 April, 2013 4:01 PM > To: Carson Holt > Subject: Re: [maker-devel] Repeatmasker error? > > (chuckle) wow, always something new to learn -- been working with IT systems > for > 20 years, and HPC > 8, and no one has ever explained this to me. > > Have directed TMP to /scratch, which also turns out to be an Isilon-related > mount. Will re-direct all this to /tmp to see if this eliminates the problems. > > -b > > On Apr 22, 2013, at 3:43 PM, Carson Holt wrote: > >> The missing file is part of the GFF3 output, the fasta sequence to be >> specific. Sometimes on NFS (network mounted file systems), they can return >> status 'success' even though the IO event really has not succeeded yet (this >> is called asynchronous IO). The result is a certain speed gain but it also >> means that you can write a file, then immediately try and open it, and the >> system will say that it doesn't exist. On some systems you get weird files >> starting with the name '.nfs000' when these types of errors occur. NFS type >> errors are more common when you use many cpus or other jobs on the cluster >> (not just maker) are using a large amount of IO. To avoid this, MAKER tries >> to do as much work as possible in the directory specified by TMP in the >> control files. By default this is /tmp, and if you set it to something else, >> make sure that the location is locally mounted and not NFS mounted (otherwise >> it can't perform it's purpose of bypassing NFS for certain quick read/write >> operations). The newest version of MAKER unloads exonerate and even most >> gene prediction operations into TMP in addition to other steps that were >> already unloaded there in other versions of the pipeline, and I've been able >> to scale up to > 1500 cpus. >> >> Thanks, >> Carson >> >> >> >> From: "Freeman, Robert M." >> Date: Monday, 22 April, 2013 3:24 PM >> To: Carson Holt >> Subject: Re: [maker-devel] Repeatmasker error? >> >> Thanks, Carson. I'll give this a try. >> >> Randomly? Not sure ... I'll have to go back thru the logs to see if this is >> happening consistently or not. Right now, this log is close to 1 GB in size. >> When I saw it getting this large, I stopped the run as I knew errors were >> getting spewed into the log file. >> >> Thought it might be filesystem as well, but unlikely -- the location for the >> MAKER runs is on our Isilon, and these problems appear only with MAKER. >> >> Other files seem to be present... >> >>> % ls -alt >>> drwxrwx--- 3 rmf1 SYSTEMBIO_klab_genome 236 Apr 21 14:50 .. >>> drwxrwx--- 2 rmf1 SYSTEMBIO_klab_genome 48225 Apr 21 12:38 . >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:34 run.log.child.0 >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.final.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.raw.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 388512 Apr 21 12:34 evidence_0.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7269 Apr 21 12:34 >>> contig_157.102049-103030.gi%7C145478069%7Cref%7CXP_001425057% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 4561 Apr 21 12:34 >>> contig_157.101950-103090.gi%7C145514179%7Cref%7CXP_001443000% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7088 Apr 21 12:34 >>> contig_157.101956-103435.gi%7C145505343%7Cref%7CXP_001438638% >>> 2E1%7C.p_exonerate >>> .... >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7184469 Apr 21 12:33 >>> contig_157.0.sequences_r5%2Efasta.blastx >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:25 >>> query.masked.gff.def >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9885 Apr 21 12:25 >>> query.masked.gff.ann >>> -rw-r--r-- 1 rmf1 SYSTEMBIO_klab_genome 49152 Apr 21 12:25 >>> query.masked.fasta.index >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:25 >>> query.masked.fasta >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9991 Apr 21 12:25 >>> query.masked.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 28002 Apr 21 12:25 >>> contig_157.0.te_proteins%2Efasta.repeatrunner >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 3589 Apr 21 12:24 >>> contig_157.0.all.rb.out >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:23 query.fasta >>> >> It's just that the one output file MAKER is looking for isn't there. >> >> I guess the other question I should ask: as there are exonerate sequences >> there, does it appear that the pipeline is running OK, and just ignore these >> errors (somehow)? >> >> -b >> >> On Apr 22, 2013, at 12:39 PM, Carson Holt wrote: >> >>> Could you give the devel version a try to see if it experiences the same >>> failure, as it's easier to debug off of the most current code. >>> >>> Type this on the command line to download--> >>> ************************** >>> >>> user: ******* >>> password: ******* >>> >>> The error appears to be filesystem related though. Does it appear to happen >>> randomly? >>> >>> Thanks, >>> Carson >>> >>> From: "Freeman, Robert M." >>> Date: Monday, 22 April, 2013 10:09 AM >>> To: "maker-devel at yandell-lab.org" >>> Subject: [maker-devel] Repeatmasker error? >>> >>> Greetings, >>> >>> Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log >>> files are growing to GB sizes. When looking more carefully, an error seems >>> to be occurring around the Repeatmasker stage: >>> >>>> .... >>>> Now starting the contig!! >>>> -- >>>> >>>> setting up GFF3 output and fasta chunks >>>> doing repeat masking >>>> doing blastx repeats >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx repeats >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> ERROR: Can't open seq file: >>>> /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_cur >>>> rent_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/co >>>> ntig_157//theVoid.contig_157/query.masked.gff.seq >>>> No such file or directory >>>> >>>> at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm >>>> line 182 >>>> Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') >>>> called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line >>>> 691 >>>> Process::MpiChunk::__ANON__() called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 >>>> eval {...} called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 >>>> Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib >>>> ... >>> >>> I don't seem to have this problem when I fall back to the 2.25b version >>> (though I start having major DBD:SQLite issues). >>> >>> I'm doing this on a cluster, running this under MPI with 50 cores. >>> >>> Any help/suggestions would be appreciated! >>> >>> -Bob >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m >>> aker-devel_yandell-lab.org >> >> ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Mon Apr 29 09:58:09 2013 From: ejr at stowers.org (Ross, Eric) Date: Mon, 29 Apr 2013 15:58:09 +0000 Subject: [maker-devel] repeat statistics Message-ID: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From barry.moore at genetics.utah.edu Mon Apr 29 11:59:14 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 29 Apr 2013 11:59:14 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Mon Apr 29 16:49:12 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 29 Apr 2013 15:49:12 -0700 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the > web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue Apr 30 00:14:44 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 30 Apr 2013 00:14:44 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Correct. And web page is now updated as well. B On Apr 29, 2013, at 4:49 PM, Jason Stajich wrote: > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > >> Does anyone have a good tool for yanking repeat statistics out of MAKER >> gff files? >> >> SOBA can give some basic stats, but it doesn't play well with my giant >> files and I haven't figured out a way to run it locally. >> >> For that matter does anyone have a script that will calculate SOBA like >> stats locally? I'd rather avoid writing one myself if something else is >> out there. >> >> Thanks, >> >> Eric >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 08:50:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 10:50:58 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364760124.37890.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Mon Apr 1 10:27:23 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Mon, 1 Apr 2013 16:27:23 +0000 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: References: Message-ID: Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 10:38:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 12:38:18 -0400 Subject: [maker-devel] Why are some start positions minus in the gff result? In-Reply-To: Message-ID: I'm thinking the same thing. Reviewing how I parse GeneMarks output, I just use their start and end coordinates (no changes). Over the weekend I altered the GeneMark parser to walk the gene start away from the supposed origin so as not to let this happen. In your E. coli test case since you have multiple contigs for what is likely a single circular genome, this would be the correct behavior as you don't want to treat each contig as an independent circular chromosome. I should probably add an is_circular option to the control files so users can select for this. I've updated the maker subversion repository so you can do an 'svn update' (I believe you are using the devel version of MAEKR correct?) Thanks, Carson From: Daniel Ence Date: Monday, 1 April, 2013 12:27 PM To: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Why are some start positions minus in the gff result? Hi, I seem to remember some discussion of the possibility of negative coordinates in a gff3 file when the genomic material is circular. Since you're annotating viral genomes, could this be whats happening here? Like Carson said, I've never seen this before, but that's just an idea I had. Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Hung-Wei Hsu [ares711122 at gmail.com] Sent: Monday, March 25, 2013 8:50 PM To: maker-devel at yandell-lab.org Subject: [maker-devel] Why are some start positions minus in the gff result? Hi MAKER developers, I could successfully run MAKER and get the final gff. But I found some start positions in the gff were minus. That led to error in the gff reader. Is this a bug? Could you please help to resolve this problem? Thanks a lot in advance. Best regards, Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 13:59:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 15:59:18 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364846015.96057.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 14:29:40 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 16:29:40 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364847674.21064.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 1 15:47:38 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 01 Apr 2013 17:47:38 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: Message-ID: That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Tue Apr 2 07:09:18 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 14:09:18 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs Message-ID: <515AD87E.1010800@ebi.ac.uk> Hello Carson! (Mpi) Maker (2.27) is failing when it runs blast searches. It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. I've looked at a few contigs on which maker fails and they were all rather short contigs. Maker works fine, if I - run it without mpi or - run it with mpi, but a maximum of 4 processors. (Mpi) Maker used to run fine with 128 processors before this. The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. Any idea how to overcome this?. Cheers, Michael. P.S.: A typical error message I'm getting is this: --Next Contig-- [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:LSalAtl2s8083 doing blastx repeats setting up GFF3 output and fasta chunks doing blastx repeats re reading repeat masker report. /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences running blast search. running blast search. #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# #--------- command -------------# Widget::blastx: /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s8449 Length: 2187 Tries: 18!! #--------------------------------------------------------------------- From carsonhh at gmail.com Tue Apr 2 07:15:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:15:28 -0400 Subject: [maker-devel] Help on error-Repeat masker In-Reply-To: <1364865389.66083.YahooMailNeo@web164901.mail.bf1.yahoo.com> Message-ID: The best evidence is from mRNAseq or ESTs of the same species. ESTs and mRNAseq from related species can be used, but if protein annotations are available use those instead. This is because cross species nucleotide alinements must be translated in all 6 reading frames (3 for the query and 3 for the subject), which would basically make run times increase by 6 fold. You can try giving the cross species alignments to maker as if they were from the same species instead (est= option), fewer will align, but run times will not be overwhelming. Then provide the protein annotations from the related species combined with uniprot (maker can take comma separated lists for the input files). You can use either the program CEGMA from Ian Korf's lab or alternatively maker's protein2genome option to build an initial annotation set to use for training. Then train SNAP, Augustus, and GeneMark (Genemark self trains). For the last run let all 3 predictors run together with protein2genome now turned off. Given that the genome is only 50Mb and you have a lack of alignment evidence, you can probably safely set keep_preds=1 on the second run as the false positive rate is usually quite low for gene dense organisms and you won't get many annotations from maker otherwise without more evidence alignments. Perform your first and second runs in the same location, so maker can automatically reuse the same alignments (the second run is always very fast this way as maker won't have to rerun blast and exonerate). If your organism is a fungi (I'm just guessing because of the small genome size) you can also use this gene prediction parameter resource from Jason Stajich --> https://github.com/hyphaltip/fungi-gene-prediction-params Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 9:16 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks, i now have better insight regard to the cpu cores.. i have other questions...i dont have other info or evidences of my own genome, i only have assembled contigs....recently JGI sequenced a species that closely related to my genome (at genus level), and i have access to the data (protein, est, rna-seq reads,transcript, gene models,gff) 1.I have run maker (MWAS) using diff set of evidences, such as protein and est(JGI) and est(JGI) and uniprot database ..but both run produced diferent no of predicted genes....so my question, what is the best evidences to be used to support my annotation..is it more preferred to use larger dataset such as uniprot rather than using the data from JGI (even it closely related) 2. can i use rna-seq data (from JGI) to be used in maker...ive denovo assembled the rnaseq using clc genomics. Thanks From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 5:01 AM Subject: Re: [maker-devel] Help on error-Repeat masker That's not too bad It's best to choose a few large contigs (1-2Mb total) to run with at first and then use those results to help configure the rest of the run. For the final run you may want to consider splitting onto multiple machines if your machine has limited cpu power. It will take you ~150 hours on 1 cpu core depending on the size of alignment datasets - ESTs and proteins. More cpu cores will allow it to run faster (see graph below from the MAKER2 paper). I imagine that your machine probably has at least 4 cpu cores. Most bioinformatics labs have multi cpu Linux boxes (I.e. 24-32 cpu cores), some have clusters available to them (100's to 1000's of cpu cores), and a few just launch maker on multiple lab desktop machines all writing to the same network mounted output directory. Thanks, Carson Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:48 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Its about 50mb From: Carson Holt To: Hud Hud Sent: Tuesday, April 2, 2013 4:44 AM Subject: Re: [maker-devel] Help on error-Repeat masker How big is the genome? --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:37 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker owh thanks so much,now i know whats going wrong, its the cygwin... i'll try dual boot then as my genome over 10 mb..thanks for your time From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 4:29 AM Subject: Re: [maker-devel] Help on error-Repeat masker I found it odd because perl.exe is a windows extension not used in Linux, but it confirmed my suspicions. You can't use maker with cygwin. There are several things that will break because it's not really Linux. You can use Virtual Box instead to install a virtual Linux machine --> https://www.virtualbox.org/. Alternatively you can try and dual boot your system with a Linux partition. Virtual Box will allow you to run maker on small datasets, depending on the size of the genome you want to run maker with it may be fine. But I would not recommend running anything over 10 megabases (it won't fail, it will just take a very long time). Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 4:21 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker 1. owh its odd?im using windows8 but for maker im using cygwin 2. when i type which perl i get this /usr/bin/perl 3. when i type ./Build repeatmasker i got this cygwin warning: MS-DOS style path detected: \Users\Dora Preferred POSIX equivalent is: /cygdrive/c/Users/Dora CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames WARNING: RepeatMasker was already found on this system. Do you still want MAKER to install RepeatMasker for you? is there any prob with this, or can i just proceed with the installation? From: Carson Holt To: Hud Hud Cc: "maker-devel at yandell-lab.org" Sent: Tuesday, April 2, 2013 3:59 AM Subject: Re: [maker-devel] Help on error-Repeat masker What kind of system (OS) are you running on? 'perl.exe' seems odd. It appears that the perl is different for maker and RepeatMasker. What do you get when you type 'which perl' on the command line? I think you need to reinstall RepeatMasker at a minimum. To do that --> > cd /home/maker-2.27-beta/maker/src > ./Build repeatmasker --Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 1 April, 2013 3:53 PM To: Carson Holt Subject: Re: [maker-devel] Help on error-Repeat masker Thanks for the reply 1. Yes i set up the maker myself as own user but i dont know how to check for the mounting things 2. Im calling maker directly and i've tried this cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" and it gaves me this #!/usr/bin/perl.exe From: Carson Holt To: Hud Hud ; "maker-devel at yandell-lab.org" Sent: Monday, April 1, 2013 10:50 PM Subject: Re: [maker-devel] Help on error-Repeat masker This appears to be a permissions issue either for the /u1/local/bin/ directory or RepeatMasker setup. Did you set maker up yourself as your own user or did someone else do it for you, perhaps as root? Is /u1/local/bin/ on an NFS mount. If it's a mounting issue I found this via google the exact same issue--> >> I needed to add the 'exec' option to the /etc/fstab file when mounting that >> partition. >> If it says 'defaults' on the line in /etc/fstab, then it also means you don't >> have exec rights on it. Are you using the same perl to run maker as you are using for RepeatMasker? For example, are you calling perl directly and giving the path to maker or are you calling maker directly and letting it use the version of perl it was installed with. Try this to see which perl maker was installed with --> cat /home/maker-2.27-beta/maker/bin/maker | grep "#\!" You may have to have to reinstall RepeatMasker and possibly maker. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Sunday, 31 March, 2013 4:02 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Help on error-Repeat masker Hello, i have some problem when runnning maker, i've got this kind of error, what could possibly go wrong here? Thnks so much setting up GFF3 output and fasta chunks doing repeat masking running repeat masker. #--------- command -------------# Widget::RepeatMasker: cd /tmp/maker_WOVHsi; /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid .contig172/contig172.0.simple.rb -dir /home/maker-2.27-beta/maker/data/contig.maker.output/contig_datastore/61/0D/ contig172//theVoid.contig172 -pa 1 - lib /tmp/maker_WOVHsi/b1piBcWHlH #-------------------------------# sh: /home/maker-2.27-beta/maker/exe/RepeatMasker/RepeatMasker: /u1/local/bin/perl: bad interpreter: Permission denied ERROR: RepeatMasker failed --> rank=NA, hostname=Homis ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:contig172 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:172 examining contents of the fasta file and run log _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 08EB6777-DA72-45CA-8E05-07928457B9BE.png Type: image/png Size: 61806 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 2 07:57:08 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 09:57:08 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AD87E.1010800@ebi.ac.uk> Message-ID: Could you set the TMP= option to a non-NFS mounted location (the default /tmp should work) and let me know if it still fails? You can also try completely deleting the LSalAtl2s.maker.output/mpi_blastdb directory before restarting. Thanks, Carson On 13-04-02 9:09 AM, "Michael Nuhn" wrote: >Hello Carson! > >(Mpi) Maker (2.27) is failing when it runs blast searches. > >It prints out the command it is trying to run. When I try to run this >command manually on the command line, blast terminates with an error, >because it either can't find the input file or it can't find a file >ending in .pin, which I think is a protein index file it expects to be >there. > >I've looked at a few contigs on which maker fails and they were all >rather short contigs. > >Maker works fine, if I > >- run it without mpi or >- run it with mpi, but a maximum of 4 processors. > >(Mpi) Maker used to run fine with 128 processors before this. > >The contigs are sorted descending by size in the genome file. I think >maker has processed the large ones and the problems it is having now >might have something to do with it running on smaller contigs. > > From looking at the error messages I thought at first the index file of >the genome might be corrupted, so I deleted it and let maker rebuild it. >This didn't fix the issue though. I have also set the path for temporary >files manually to make sure maker is not running out of temporary space. > >Any idea how to overcome this?. > >Cheers, >Michael. > >P.S.: A typical error message I'm getting is this: > >--Next Contig-- > >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 > >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAt >l2s8087.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_ >final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAt >l2s8087// >theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.tem >p_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >ERROR: Chunk failed at level:2, tier_type:0 >FAILED CONTIG:LSalAtl2s8083 > >doing blastx repeats >setting up GFF3 output and fasta chunks >doing blastx repeats >re reading repeat masker report. >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSal >Atl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >running blast search. >running blast search. >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAt >l2s8135.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSal >Atl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >#--------- command -------------# >Widget::blastx: >/nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2 >Efasta.mpi.10.0 >-i >/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAt >l2s8119.0 >-b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o >/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSal >Atl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repea >trunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >#-------------------------------# >[blastall] FATAL ERROR: search cannot proceed due to errors in all >contexts/frames of query sequences >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s8449 >Length: 2187 >Tries: 18!! >#--------------------------------------------------------------------- > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mnuhn at ebi.ac.uk Tue Apr 2 08:38:31 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Tue, 02 Apr 2013 15:38:31 +0100 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> References: <515AD87E.1010800@ebi.ac.uk> <00E9A24F-728F-496D-A30C-6EA83676FF64@sanger.ac.uk> Message-ID: <515AED67.5060906@ebi.ac.uk> On 04/02/2013 02:01 PM, Eleanor Stanley wrote: > what version of Blast are you using? > I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved using BLAST+ 2.2.27 instead I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. I am running it on one mpi instance with 128 processors and it seems to be working now. Thanks! Michael. > Ele > > > On 2 Apr 2013, at 14:09, Michael Nuhn wrote: > >> Hello Carson! >> >> (Mpi) Maker (2.27) is failing when it runs blast searches. >> >> It prints out the command it is trying to run. When I try to run this command manually on the command line, blast terminates with an error, because it either can't find the input file or it can't find a file ending in .pin, which I think is a protein index file it expects to be there. >> >> I've looked at a few contigs on which maker fails and they were all rather short contigs. >> >> Maker works fine, if I >> >> - run it without mpi or >> - run it with mpi, but a maximum of 4 processors. >> >> (Mpi) Maker used to run fine with 128 processors before this. >> >> The contigs are sorted descending by size in the genome file. I think maker has processed the large ones and the problems it is having now might have something to do with it running on smaller contigs. >> >> From looking at the error messages I thought at first the index file of the genome might be corrupted, so I deleted it and let maker rebuild it. This didn't fix the issue though. I have also set the path for temporary files manually to make sure maker is not running out of temporary space. >> >> Any idea how to overcome this?. >> >> Cheers, >> Michael. >> >> P.S.: A typical error message I'm getting is this: >> >> --Next Contig-- >> >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 >> -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSalAtl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /n >> fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LSalAtl2s8087// >> theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> ERROR: Chunk failed at level:2, tier_type:0 >> FAILED CONTIG:LSalAtl2s8083 >> >> doing blastx repeats >> setting up GFF3 output and fasta chunks >> doing blastx repeats >> re reading repeat masker report. >> /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> running blast search. >> running blast search. >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSalAtl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> #--------- command -------------# >> Widget::blastx: >> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins%2Efasta.mpi.10.0 -i /nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSalAtl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/maker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >> #-------------------------------# >> [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences >> #--------------------------------------------------------------------- >> Now retrying the contig!! >> SeqID: LSalAtl2s8449 >> Length: 2187 >> Tries: 18!! >> #--------------------------------------------------------------------- >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue Apr 2 08:16:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 02 Apr 2013 10:16:44 -0400 Subject: [maker-devel] Blastx of repeats with mpi maker failing on small contigs In-Reply-To: <515AED67.5060906@ebi.ac.uk> Message-ID: Good to know. Thanks, Carson On 13-04-02 10:38 AM, "Michael Nuhn" wrote: > >On 04/02/2013 02:01 PM, Eleanor Stanley wrote: >> what version of Blast are you using? >> I was getting similar errors with NCBI BLAST+ 2.2.23 that were resolved >>using BLAST+ 2.2.27 instead > >I was using blast version 2.2.14. I've now swapped it out for ncbi+ 2.2.9. > >I am running it on one mpi instance with 128 processors and it seems to >be working now. > >Thanks! >Michael. > >> Ele >> >> >> On 2 Apr 2013, at 14:09, Michael Nuhn wrote: >> >>> Hello Carson! >>> >>> (Mpi) Maker (2.27) is failing when it runs blast searches. >>> >>> It prints out the command it is trying to run. When I try to run this >>>command manually on the command line, blast terminates with an error, >>>because it either can't find the input file or it can't find a file >>>ending in .pin, which I think is a protein index file it expects to be >>>there. >>> >>> I've looked at a few contigs on which maker fails and they were all >>>rather short contigs. >>> >>> Maker works fine, if I >>> >>> - run it without mpi or >>> - run it with mpi, but a maximum of 4 processors. >>> >>> (Mpi) Maker used to run fine with 128 processors before this. >>> >>> The contigs are sorted descending by size in the genome file. I think >>>maker has processed the large ones and the problems it is having now >>>might have something to do with it running on smaller contigs. >>> >>> From looking at the error messages I thought at first the index file >>>of the genome might be corrupted, so I deleted it and let maker rebuild >>>it. This didn't fix the issue though. I have also set the path for >>>temporary files manually to make sure maker is not running out of >>>temporary space. >>> >>> Any idea how to overcome this?. >>> >>> Cheers, >>> Michael. >>> >>> P.S.: A typical error message I'm getting is this: >>> >>> --Next Contig-- >>> >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 >>> -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank16/LSal >>>Atl2s8087.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o /n >>> >>>fs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/make >>>r_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/A2/0B/LS >>>alAtl2s8087// >>> >>>theVoid.LSalAtl2s8087/LSalAtl2s8087.0.te_proteins%2Efasta.repeatrunner.t >>>emp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> ERROR: Chunk failed at level:2, tier_type:0 >>> FAILED CONTIG:LSalAtl2s8083 >>> >>> doing blastx repeats >>> setting up GFF3 output and fasta chunks >>> doing blastx repeats >>> re reading repeat masker report. >>> >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/2C/53/ >>>LSalAtl2s8249//theVoid.LSalAtl2s8249/LSalAtl2s8249.0.all.rb.out >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> running blast search. >>> running blast search. >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank26/LSal >>>Atl2s8135.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/EF/10/ >>>LSalAtl2s8135//theVoid.LSalAtl2s8135/LSalAtl2s8135.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> #--------- command -------------# >>> Widget::blastx: >>> /nfs/panda/ensemblgenomes/external/blast/bin/blastall -p blastx -d >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/te_proteins >>>%2Efasta.mpi.10.0 -i >>>/nfs/nobackup2/ensemblgenomes/mnuhn/maker/temp2/maker_u5Dl1K/rank19/LSal >>>Atl2s8119.0 -b 10000 -v 10000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T >>>-I T -o >>>/nfs/production/panda/ensemblgenomes/development/mnuhn/Sea_louse/test/ma >>>ker_final_assembly_III/LSalAtl2s.maker.output/LSalAtl2s_datastore/CA/2E/ >>>LSalAtl2s8119//theVoid.LSalAtl2s8119/LSalAtl2s8119.0.te_proteins%2Efasta >>>.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.0.repeatrunner >>> #-------------------------------# >>> [blastall] FATAL ERROR: search cannot proceed due to errors in all >>>contexts/frames of query sequences >>> #--------------------------------------------------------------------- >>> Now retrying the contig!! >>> SeqID: LSalAtl2s8449 >>> Length: 2187 >>> Tries: 18!! >>> #--------------------------------------------------------------------- >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Thu Apr 4 11:29:24 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:29:24 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Thu Apr 4 11:40:48 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 4 Apr 2013 17:40:48 +0000 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: Message-ID: One more thought. If 26,28, and 30 process jobs are failing this could also be because they are not starting across nodes correctly (all end up on the same node). You would then start to run into memory problems and the job would freeze. So validating the proper cross node launch of MPI using the 'hostname' command is still probably the first thing to do. --Carson From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM To: Ram?n Fallon >, "maker-devel at yandell-lab.org" > Subject: Re: second maker2 benchmark, this time, on a cluster Since you are using 12 core nodes (hyperthreaded cores are virtual ? you still only have 12 cores of power not 24) and your performance curve drops off at 12, I'm thinking there is a possibility that the other processes did not start on a separate node. Try launching the Linux command 'hostname' the same way you are launching maker. If all 24 lines of output from hostname have the same host, then maker is only getting launched on a single node. Then since there are really only 12 cores (not 24) you would not see any significant performance improvement above 12. So each process above 12 will reduce the power allocated to remaining processes. So the difference from 12 to 24 (~25% performance gain) is just what can be gained from process saturation (not all maker processes are always at 100% cpu usage because of calls to IO so adding a few more processes than you have cpu cores sometimes runs a little faster). Thanks, Carson From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramonfallon at gmail.com Thu Apr 4 11:03:43 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Thu, 4 Apr 2013 19:03:43 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster Message-ID: Hi I've done another of my own benchmarks with the Maker2 svn (rev 1017) code. Last time I went up to 12 processes, this time I aimed for 48. In contrast to the last 12 core speed check, the target hardware was a computer cluster, with the Gridengine queue manager. The same data set of 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence in one file with different names). The nodes in the cluster are (again) HP Proliant SL390 with two Intel X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is that Maker2 was launched from an NFS3 shared home directory, although the /tmp directories are local to the process running on each node. Nodes are interconnected via infiniband quadspeed, and because of hyperthreading, can offer 24 "process-cores" to a job. No overlap between runs was allowed. Results were: #processes time(secs) Megabases/hr 1 6585.00 2.20 2 7137.00 2.03 4 2479.00 5.84 8 1088.00 13.30 10 866.00 16.71 12 715.00 20.24 14 666.00 21.72 16 651.00 22.22 18 613.00 23.60 24 559.00 25.88 Graph is attached to this mail. Some notes: * A free queue on the gridengine were used so there was no load on these nodes when run. Two nodes are available on this queue, giving a max of 48 simualtaneous processes. * Some processor number (6,20, etc) were deleted because I couldn't guarantee "No load" conditions during those runs, and I had one or two anomalies so I'd rather not include them right now. However, I expect them to be in line with the other results. * In general the graph shows more consistent performance than last time, but unfortunately I got incomplete runs after processes=24. Because this is also the max number of processes per node, it's possible that interconnects between the nodes had something to do with runs > 24 processes being inconsistent, however, it's not usually an issue in other programs because quadspeed (40Gbit/s) is already a fairly fast interconnect). * Process runs 26,28, and 30 would almost - but not quite - finish (just a few sequences unfinished), But after this number, the analysis would hardly get off the ground, seeming to get stuck at Repeatmasker phase. I suppose this is our main concern at the moment, that we can't speed up beyond 24 processes. Cheers / Ram?n. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 48proc.png Type: image/png Size: 24644 bytes Desc: not available URL: From ramonfallon at gmail.com Fri Apr 5 10:00:53 2013 From: ramonfallon at gmail.com (=?ISO-8859-1?Q?Ram=F3n_Fallon?=) Date: Fri, 5 Apr 2013 18:00:53 +0200 Subject: [maker-devel] second maker2 benchmark, this time, on a cluster In-Reply-To: References: Message-ID: Thanks for the replies Carson, Our cluster has got busy all of a sudden, so I have to wait a bit to do the hostname test. However, I'm fairly sure (not 100%, mind you) that when the process number is over 24 if will definitely run the extra processes on a separate node, and so do a proper cross node launch. On Thu, Apr 4, 2013 at 7:40 PM, Carson Holt wrote: > One more thought. If 26,28, and 30 process jobs are failing this could > also be because they are not starting across nodes correctly (all end up on > the same node). You would then start to run into memory problems and the > job would freeze. So validating the proper cross node launch of MPI using > the 'hostname' command is still probably the first thing to do. > > --Carson > > > * > * > From: Carson Holt > Date: Thursday, 4 April, 2013 1:29 PM > To: Ram?n Fallon , "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org> > Subject: Re: second maker2 benchmark, this time, on a cluster > > Since you are using 12 core nodes (hyperthreaded cores are virtual ? > you still only have 12 cores of power not 24) and your performance curve > drops off at 12, I'm thinking there is a possibility that the other > processes did not start on a separate node. Try launching the Linux > command 'hostname' the same way you are launching maker. If all 24 lines > of output from hostname have the same host, then maker is only getting > launched on a single node. Then since there are really only 12 cores (not > 24) you would not see any significant performance improvement above 12. So > each process above 12 will reduce the power allocated to remaining > processes. So the difference from 12 to 24 (~25% performance gain) is just > what can be gained from process saturation (not all maker processes are > always at 100% cpu usage because of calls to IO so adding a few more > processes than you have cpu cores sometimes runs a little faster). > > Thanks, > Carson > > > > From: Ram?n Fallon > Date: Thursday, 4 April, 2013 1:03 PM > To: "maker-devel at yandell-lab.org" > Subject: second maker2 benchmark, this time, on a cluster > > Hi > > I've done another of my own benchmarks with the Maker2 svn (rev 1017) > code. Last time I went up to 12 processes, this time I aimed for 48. In > contrast to the last 12 core speed check, the target hardware was a > computer cluster, with the Gridengine queue manager. The same data set of > 4.019 megabases was used as before (125 times the dpp_contig.fasta sequence > in one file with different names). > > The nodes in the cluster are (again) HP Proliant SL390 with two Intel > X5675 @ 3.07GHz, with this time only 48GB RAM and 1TB local disk running > Centos 6.2 with (as before) 2.6.32 linux kernel. A marked difference is > that Maker2 was launched from an NFS3 shared home directory, although the > /tmp directories are local to the process running on each node. Nodes are > interconnected via infiniband quadspeed, and because of hyperthreading, can > offer 24 "process-cores" to a job. No overlap between runs was allowed. > > Results were: > #processes time(secs) Megabases/hr > 1 6585.00 2.20 > 2 7137.00 2.03 > 4 2479.00 5.84 > 8 1088.00 13.30 > 10 866.00 16.71 > 12 715.00 20.24 > 14 666.00 21.72 > 16 651.00 22.22 > 18 613.00 23.60 > 24 559.00 25.88 > > Graph is attached to this mail. Some notes: > * A free queue on the gridengine were used so there was no load on these > nodes when run. Two nodes are available on this queue, giving a max of 48 > simualtaneous processes. > * Some processor number (6,20, etc) were deleted because I couldn't > guarantee "No load" conditions during those runs, and I had one or two > anomalies so I'd rather not include them right now. However, I expect them > to be in line with the other results. > * In general the graph shows more consistent performance than last time, > but unfortunately I got incomplete runs after processes=24. Because this is > also the max number of processes per node, it's possible that interconnects > between the nodes had something to do with runs > 24 processes being > inconsistent, however, it's not usually an issue in other programs because > quadspeed (40Gbit/s) is already a fairly fast interconnect). > * Process runs 26,28, and 30 would almost - but not quite - finish (just a > few sequences unfinished), But after this number, the analysis would hardly > get off the ground, seeming to get stuck at Repeatmasker phase. I suppose > this is our main concern at the moment, that we can't speed up beyond 24 > processes. > > Cheers / Ram?n. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 01:25:40 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 16:25:40 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked Message-ID: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 05:20:16 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 13:20:16 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <145c01ce3297$f318eab0$d94ac010$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 05:24:31 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sat, 6 Apr 2013 20:24:31 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> Message-ID: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 07:54:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 09:54:15 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: It's all CDS, from start to finish. There is never any UTR in the ab initio reference match/match_part alignments. There are two reasons for this. First most ab initio predictors don't produce UTR. Second GFF3 has n is_analysis flag, so it is impossible to separate final gene models from predicted gene models if they are both in the form gene/mRNA/exon/CDS. Augustus can predict UTR, but gien the limitation just mentioned, if I reject the model, I have to trim it before adding it to the reference information. We've actually been in discussion with the apollo development group over this limitation. Original apollo found the same limitation, so they make the same assumption for loading data into the browsing window (gene/mRNA/exon/CDS features always go in the middle annotation track and everything else goes in the reference evidence track). With the new web apollo, we're working on getting the default behavior to allow UTR in the gene predictions by using the SO predicted gene term in the GFF3 (which previously wasn't available for use in apollo and maker). So in summary. Nothing but CDS form now, but will include CDS when available in the sequence in the near future. Thanks, Carson From: "Kang, Yang Jae" Date: Saturday, 6 April, 2013 7:24 AM To: 'Michael Thon' Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.thon at gmail.com Sat Apr 6 08:37:28 2013 From: mike.thon at gmail.com (Michael Thon) Date: Sat, 6 Apr 2013 16:37:28 +0200 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <148d01ce32b9$51407380$f3c15a80$@gmail.com> References: <145c01ce3297$f318eab0$d94ac010$@gmail.com> <148d01ce32b9$51407380$f3c15a80$@gmail.com> Message-ID: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 09:13:16 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 11:13:16 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What > I want to do is calculating Ka Ks values so that I need coding sequences. Is > there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com ] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the > augustus_masked feature there is no indication of CDS or Exon like maker > features. Is there any way for me to retrieve CDS from augustus_masked? There > were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sat Apr 6 12:45:02 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Sun, 7 Apr 2013 03:45:02 +0900 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> Message-ID: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Thank you for quick response again! I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and I additionally found the offset value some position after '>' letter. Is that indicate the starting ATG? Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I'm asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Thank you From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" < kangyangjae at gmail.com> wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Sat Apr 6 14:50:29 2013 From: barry.utah at gmail.com (Barry Moore) Date: Sat, 6 Apr 2013 14:50:29 -0600 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> References: <1E30F6C6-753C-4397-AE1E-70C034976C37@gmail.com> <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: <3B421D2C-590D-4593-8FA5-3CAA10A19FD4@genetics.utah.edu> On Apr 6, 2013, at 12:45 PM, Kang, Yang Jae wrote: > Thank you for quick response again! > > I found the non-ATG starting sequences in transcript file. I thought this would be the UTR traces, and The gene predictors will occasionally produce a transcript with no start/stop codon, set always_complete=1 in maker_opts.clt to get MAKER to try hard to force a start/stop codon. > I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? I didn't really understand that question... > Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? The reason why I?m asking is that some genes were augustus_masked is a file that contains proteins of all predictions make by Augustus when working on masked sequence. Setting unmask=1 in maker_opts.ctl would instruct MAKER to also run the gene predictors on unmasked sequence and then you'd have a augustus_unmasked file for those predicitions. The non_overlapping_ab_initio files contain proteins predicted by all gene predictors for which MAKER could not find protein/RNA evidence for, so they are unsupported by physical evidence. These unsupported predictions are not promoted by MAKER into annotations in it's final output, but they are included in these files in case you want to work with them. The non_overlapping part of the name means that if multiple gene predictors produce overlapping un support ab initio predictions then MAKER will only output one of them. > redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. Yes, the proteins for genes for which MAKER creates annotations will be in both files. > > Thank you > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Sunday, April 07, 2013 12:13 AM > To: Michael Thon; Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). > > --Carson > > From: Michael Thon > Date: Saturday, 6 April, 2013 10:37 AM > To: "Kang, Yang Jae" > Cc: > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... > On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: > > > Thank for your quick response Mike > I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? > > Thank you > > > From: Michael Thon [mailto:mike.thon at gmail.com] > Sent: Saturday, April 06, 2013 8:20 PM > To: Kang, Yang Jae > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] CDS retrieve from augustus_masked > > Hi Kang - After running fasta_merge there should be a file: > > [prefix].all.maker.augustus_masked.transcripts.fasta > > in the output directory. Is that what you need? > Mike > > On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" wrote: > > > > Dear everyone! > > I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. > > Thank you! > > Kang, Yang Jae > Ph.D. > Cropgenomics Lab. > College of Agriculture and Life Science > Seoul National University > Korea > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sat Apr 6 15:00:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Sat, 06 Apr 2013 17:00:19 -0400 Subject: [maker-devel] CDS retrieve from augustus_masked In-Reply-To: <14df01ce32f6$db2a9e30$917fda90$@gmail.com> Message-ID: I additionally found the offset value some position after ?>? letter. Is that indicate the starting ATG? > Only in the maker.transcripts.fasta will have offsets other than 0, you can > use these to get the transcription offset. All other *.transcript.fasta files > will always have an offset of 0 for the reason previously mentioned. Some > genes will not start with ATG or have stop codons. These are partial models. > Set always_complete=1 to reduce these. Secondly, there is several files named *.augustus_masked.proteins.fasta, *.non_overlapping_ab_initio.proteins.fasta, and *.proteins.fasta. What is the criteria of splitting those files? > Final selected annotations go in the maker.proteins.fasta and > maker.transcripts.fasta files. Raw unfiltered ab initio prediction from > augustus go in the augustus_masked.proteins.fasta and > augustus_masked.transcripts.fasta file (these are for reference purposes). A > set of non-redundant rejected models go in the > non-overlapping.transcripts.fasta and non-overlapping.proteins.fasta files > (if you are missing a gene you expected to find, look in this file first ? you > can add them back if you find protein domains in them for example). The reason why I?m asking is that some genes were redundant between *.augustus_masked.proteins.fasta and *.proteins.fasta. > This is because some of the augustus generated models made it into the final > annotation set. > > Thanks, Carson From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Sunday, April 07, 2013 12:13 AM To: Michael Thon; Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Augustus only predicts UTR for a handful of organisms. I trim them off the rejected models before outputting to the GFF3 as match/match_part features (per my previous e-mail concerning the limitations of GFF3). --Carson From: Michael Thon Date: Saturday, 6 April, 2013 10:37 AM To: "Kang, Yang Jae" Cc: Subject: Re: [maker-devel] CDS retrieve from augustus_masked Thats a good point because 'transcripts' implies that it would have the UTRs. Does augustus predict the UTRs? I manually checked the translations of the .transcript. file and I only found valid translations but that does not mean that UTRs could not be present... On Apr 6, 2013, at 1:24 PM, "Kang, Yang Jae" wrote: Thank for your quick response Mike I looked the file named transcript, but it might include UTRs I suspect. What I want to do is calculating Ka Ks values so that I need coding sequences. Is there any indication where is exact START and STOP in the transcript file? Thank you From: Michael Thon [mailto:mike.thon at gmail.com ] Sent: Saturday, April 06, 2013 8:20 PM To: Kang, Yang Jae Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] CDS retrieve from augustus_masked Hi Kang - After running fasta_merge there should be a file: [prefix].all.maker.augustus_masked.transcripts.fasta in the output directory. Is that what you need? Mike On Apr 6, 2013, at 9:25 AM, "Kang, Yang Jae" > wrote: Dear everyone! I want to retrieve CDS sequences from the output of maker; however, in the augustus_masked feature there is no indication of CDS or Exon like maker features. Is there any way for me to retrieve CDS from augustus_masked? There were protein sequences in outdir but no CDS information. Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xzhang at genome.wustl.edu Wed Apr 10 10:30:38 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Wed, 10 Apr 2013 11:30:38 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: References: Message-ID: <516593AE.8000909@genome.wustl.edu> Hi All, Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " warning, error in input file format: -3 error reading parameter BRANCH_MAT error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod Error on system: prediction step" and "Error: unknown line format". and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information " warning, error in input file format: -13 5654 dna.fa.good.gb.acc.ph2 first order for ACC 2 Error: unknown line format GC% ntron". any suggestion and comments are appreciated Thanks, Xu From xzhang at genome.wustl.edu Fri Apr 12 06:47:08 2013 From: xzhang at genome.wustl.edu (xu zhang) Date: Fri, 12 Apr 2013 07:47:08 -0500 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <516593AE.8000909@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> Message-ID: <5168024C.9040808@genome.wustl.edu> I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. Thanks, Xu On 04/10/2013 11:30 AM, xu zhang wrote: > Hi All, > > Does anybody have genemark .mod file for yeast? I tried to create my > own model file using this command" gm_es.pl > S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was > downloaded from ncbi". it failed with this error " > warning, error in input file format: > -3 > error reading parameter BRANCH_MAT > error in model file > /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod > Error on system: prediction step" and "Error: unknown line format". > > and I tried the sample file(pythium_ultimum_scaffolds.fasta) from > Carson. a mod file was created, although it also had some error > information > " warning, error in input file format: > -13 > 5654 dna.fa.good.gb.acc.ph2 > first order for ACC 2 > Error: unknown line format > GC% ntron". > > any suggestion and comments are appreciated > > Thanks, > Xu > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jason.stajich at gmail.com Fri Apr 12 09:48:53 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 12 Apr 2013 08:48:53 -0700 Subject: [maker-devel] genemark .mod file for yeast In-Reply-To: <5168024C.9040808@genome.wustl.edu> References: <516593AE.8000909@genome.wustl.edu> <5168024C.9040808@genome.wustl.edu> Message-ID: <256F7975-9744-4A53-974F-B92B0179A5B2@gmail.com> Did you email the genemark authors? They would be a better source for help. I experienced the same problems with the yeast data to train from and didn't use genemark for those species - it may be that it is expecting more introns and the files for training are empty on some rounds. Jason On Apr 12, 2013, at 5:47 AM, xu zhang wrote: > I know how to do that. I tried different initial mod file and it worked on my sequences with org_S1_55.0mtx initial mod. I don't know why. if somebody knows, please let me know. > > Thanks, > Xu > > On 04/10/2013 11:30 AM, xu zhang wrote: >> Hi All, >> >> Does anybody have genemark .mod file for yeast? I tried to create my own model file using this command" gm_es.pl S288C_reference_sequence_R64-1-1_20110203.fsa", where the sequence was downloaded from ncbi". it failed with this error " >> warning, error in input file format: >> -3 >> error reading parameter BRANCH_MAT >> error in model file /gscmnt/gc2124/info/annotation/personal_dir/xzhang/yeast/s_cerevisiae/genemark/training2/mod/es.mod >> Error on system: prediction step" and "Error: unknown line format". >> >> and I tried the sample file(pythium_ultimum_scaffolds.fasta) from Carson. a mod file was created, although it also had some error information >> " warning, error in input file format: >> -13 >> 5654 dna.fa.good.gb.acc.ph2 >> first order for ACC 2 >> Error: unknown line format >> GC% ntron". >> >> any suggestion and comments are appreciated >> >> Thanks, >> Xu >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Jason Stajich jason.stajich at gmail.com jason at bioperl.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jteckert at gmail.com Sun Apr 14 15:07:33 2013 From: jteckert at gmail.com (James Eckert) Date: Sun, 14 Apr 2013 17:07:33 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf Message-ID: Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 02:16:34 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Tue, 16 Apr 2013 16:16:34 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and_*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 08:20:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 10:20:01 -0400 Subject: [maker-devel] Annotation quality and converting gff3 to gtf In-Reply-To: Message-ID: The input GFF3 file you have the link to only contains one gene? Is that correct. If so then you should only get one gene in the output. The resulting GTF should only have the genes (ignoring all the evidence). To convert for eval use these command lines (note the flags such as -g for gff3_merge so you are only looking at genes and the fast must be included in the file, so no -n flag) gff3_merge -d maker_datastore_index.log -g -o some_file.gff add_utr_start_stop_gff some_file.gff > some_file2.gff maker2eval some_file2.gff Note that all version of MAKER after 2.09 no longer have add_utr_start_stop_gff, the UTR is now always there explicitly, so you go strait from gff3_merge and then use maker2eval_gtf However with that explanation, I have to wonder if EVAL is appropriate for you. EVAL requires a reference annotation set (that is assumed to be 100% perfect) for comparison, and you get a perfect score whenever you call the genes exactly identical to the reference set (which in itself has obvious bias, but we won't get into that). Given that you have no reference set it will not give you anything other than statistics for the distribution of introns and exon sizes. Alternate means for quality given no reference genome are AED (computed for each gene as part of the MAKER run), this is basically a variation of EVAL like statistics run against evidence clusters rather than a reference genome, or you can just use % domain content. See these links for examples of the statistics --> http://www.biomedcentral.com/1471-2105/12/491 http://www.biomedcentral.com/1471-2105/10/67 Also a figure is attached with an example of quality analysis using combined AED, domain content, and comparative orthologs. --Carson From: James Eckert Date: Sunday, 14 April, 2013 5:07 PM To: Subject: [maker-devel] Annotation quality and converting gff3 to gtf Hello, I'm currently trying to figure out ways to evaluate the quality of annotations that MAKER produces. I'm working on a novel species, so there isn't a reference genome to compare the annotation quality to. After doing a bit of searching on the web, I came across the EVAL tool, which I thought may be useful for checking the output quality. EVAL takes in gtf files, not gff3, however MAKER seems to have addressed this problem through its accessory scripts. I first used the script "gff3_merge" to have my whole annotation under one gff3 file. Next I used "add_utr_start_stop_gff". This would explicitly add the UTRs, which would be needed for converting the gff3 file to gtf. The problem arose when trying to run "gff3_to_eval_gtf". I was expecting MAKER to process the whole gff3 file, but it seems to have only processed 2 nodes. The same thing happens when running the "gff3_2_gtf" script. Here is the command I'm running, along with the output: gff3_to_eval_gtf assem_kmer_57_utr.gff3 NODE_20666_length_66353_cov_18.405483 maker CDS 8801 8984 . - 0 gene_id "1"; transcript_id "2"; NODE_20666_length_66353_cov_18.405483 maker CDS 8113 8717 . - 2 gene_id "1"; transcript_id "2"; My question is whether the "gff3_to_eval_gtf" and "gff3_2_gtf" scripts have a bug in them, or whether I'm just doing the process wrong? Perhaps if the conversion doesn't work, there exists an alternative to EVAL that works with native MAKER annotations? Attached is my whole genome gff3 file, along with the file I ran "gff3_to_eval_gtf" on. assem_kmer-57_exp-44_covcutoff-auto_contigs.all.gff3 assem_kmer_57_utr.gff3 Thank you in advance for your help, James _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: B563F1FF-1E85-42E3-B79D-F7F6449F1AE9.png Type: image/png Size: 227568 bytes Desc: not available URL: From carsonhh at gmail.com Tue Apr 16 09:34:44 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 11:34:44 -0400 Subject: [maker-devel] maker output In-Reply-To: <1366084495.59030.YahooMailNeo@web164906.mail.bf1.yahoo.com> Message-ID: For AED in general, lower is better, but you have to understand the caveats. With mRNA-seq Nnot all genes may be expressed, not all exons may be captured (mRNA can fold blocking some sequencing reactions), and sometimes the alignment may extend improperly into the intron or even merge into the neighboring gene. Also mRNA-seq captures a lot of things that aren't coding genes. But in general for mRNA-seq, as coverage increases the AED values trend toward 0, and mRNA-seq is the single most informative piece of evidence you can get for annotation (I've seen several very poor genome assemblies with horrible annotations that were saved by mRNA-seq). For mRNa-seq, give MAKER the assembled reads (trinity works well). Also for fungi, the UTR tend to overlap between genes. This can create false merging in the mRNA-seq assemblies (their AED is lower but its a false merge). Use the correct_est_fusion option in the control files to help handle that. I know there are also several members of the MAKER mailing list who have extensive experience using mRNA-seq to annotate fungi who may want to add their two cents. Thanks, Carson From: Hud Hud Reply-To: Hud Hud Date: Monday, 15 April, 2013 11:54 PM To: Carson Holt Subject: maker output Hi Carson, have a nice day.. I have a question about the output file from maker, recently i run my longest contigs (100kb) on Maker using rna-seq data from other related species of my genome (same genus)..and ive noticed that i managed to get expressed sequences match annotation compared using just EST and cDNA. Is this due to different size of dataset? as im using larger dataset when im incorporating rna-seq data (assembled transcript combined with cDNA and est) . The value of AED for both prdicted mRNA 0.15 (with rna-seq data) and 0.06(w/o rna-seq data). My question is which one is the most accurate prediction, can i just depends on the value of AED ( the lower the better)? How about the incorporation of rna-seq data in this case,can i conclude that rna-seq improves the annotation (based on the image i attached). Thanks for your time, really appreciate it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue Apr 16 09:52:07 2013 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 16 Apr 2013 15:52:07 +0000 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: References: Message-ID: Hi Huiquan, 1)The default behavior for Maker is that it will only annotate gene models when there is support from both the evidence (est and protein alignments) and from the ab-initio predictors. How many transcripts did you get from PASA? I expect there are about 254 sequences, which is about how many genes you annotated. If you want to get more gene models, then you need to supply more evidence. For our annotation projects, we often use some derivation of Swiss-prot, which is a hand-curated database of proteins across all kingdoms. 2) The non-overlapping ab-initio file includes ab-initio predictions that didn't overlap any gene models. If augustus and genemark predictions overlap, I think it should include both, but if the one prediction completely covers the other, I think the longer of the two would be included. Does that answer your questions? Thanks, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of ??? [liuhuiquan at nwsuaf.edu.cn] Sent: Tuesday, April 16, 2013 2:16 AM To: maker-devel at yandell-lab.org Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files Hello maker users and developers, I?m trying to annotate a small fungal genome by using Maker-2.27-beta. For test purpose, I just used the augustus and genemark for de novo gene prediction and supplied the PASA assembled transcripts to the est option. When maker2 finished, I used the gff3_merge and fasta_merge scripts to extract the results. There were 5608, 6255, 5084, and 254 sequences in the resulting protein files: augustus_masked, genemark, non-overlapping ab initio, and maker, respectively. My questions are: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? Thanks in advance for any advice. I appreciate your help! best, Huiquan the maker_opts.ctl file: #-----Genome (these are always required) genome=my_gnm.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=my_est.fa #set of ESTs or assembled mRNA-seq in fasta format altest= #EST/cDNA sequence file in fasta format from an alternate organism est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file altest_gff= #aligned ESTs from a closly relate species in GFF3 format #-----Protein Homology Evidence (for best results provide a file for at least one) protein= #protein sequence file in fasta format (i.e. from mutiple oransisms) protein_gff= #aligned protein homology evidence from an external GFF3 file #-----Repeat Masking (leave values blank to skip repeat masking) model_org=fungi #select a model organism for RepBase masking in RepeatMasker rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=RepeatPeps.lib #provide a fasta file of transposable element proteins for RepeatRunner rm_gff= #pre-identified repeat elements from an external GFF3 file prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering) #-----Gene Prediction snaphmm= #SNAP HMM file gmhmm=my_ges.mod #GeneMark HMM file augustus_species=my2 #Augustus gene prediction species model fgenesh_par_file= #FGENESH parameter file pred_gff= #ab-initio predictions from an external GFF3 file model_gff= #annotated gene models from an external GFF3 file (annotation pass-through) est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no #-----Other Annotation Feature Types (features MAKER doesn't recognize) other_gff= #extra features to pass-through to final MAKER generated GFF3 file #-----External Application Behavior Options alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases cpus=14 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage) min_contig=1 #skip genome contigs below this length (under 10kb are often useless) pred_flank=200 #flank for extending evidence clusters sent to gene predictors pred_stats=0 #report AED and QI statistics for all predictions as well as models AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1) min_protein=20 #require at least this many amino acids in predicted proteins alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1) split_hit=1500 #length for the splitting of hits (expected max intron size for evidence alignments) single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no single_length=200 #min length required for single exon ESTs if 'single_exon is enabled' correct_est_fusion=1 #limits use of ESTs in annotation to avoid fusion genes tries=2 #number of times to try a contig if there is a failure for some reason clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no TMP= #specify a directory other than the system default temporary directory for temporary files -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Apr 16 10:01:27 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 16 Apr 2013 12:01:27 -0400 Subject: [maker-devel] *maker.proteins and *non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The "non-overlapping" file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Tue Apr 16 19:49:04 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Wed, 17 Apr 2013 09:49:04 +0800 Subject: [maker-devel] =?utf-8?q?*maker=2Eproteins_and*non=5Foverlapping?= =?utf-8?q?=5Fab=5Finitio=2Eproteins__files?= Message-ID: An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:23:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:23:54 -0400 Subject: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files In-Reply-To: Message-ID: correct_est_fusion is not guaranteed to never merge a gene. If you are giving maker imperfect evidence, there is only so much it can do. Also you should be using protein evidence in combination with EST evidence, especially when using the correct_est_fusion option or you are limiting it's effectiveness. MAKER does not work as well on ESTs alone, especially for organisms with few introns as internal logic is relying on the combination of evidence support. --Carson From: ??? Date: Tuesday, 16 April, 2013 9:49 PM To: Subject: Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files Hi Carson and Daniel, Thank you very much for your quick responses! By multiple tries, I have known the reason why only a few genes were annotated by maker. This is due to turn on of the ?correct_est_fusion? option. I got about 8000 transcripts from PASA assembly. Because the gene density of my fungus is very high, many of the assembled transcripts merged adjacent genes even if the trinity and PASA were used with relevant parameter. Maker may not use the merged transcripts as evidence, it the ?correct_est_fusion? option is turn on. However, even though the ?correct_est_fusion? option is used, I also found many genes produced by maker have merged more than one gene. I?m now using the ORFs (trainingSetCandidates.cds) extracted from the transcripts by PASA as the EST evidence supplied to maker. I found most of the extracted ORF can accurate match the gene model predicted by augustus and genemark. This can better resolve the ?merged gene? issues for fungi with high gene density. For the 'non-overlapping' file, if only using genemark, its predictions can be found in the 'non-overlapping' file. Is previously issue due to the gene mode generated by augustus is better that genemark, so only augustus gene was putted into the 'non-overlapping' file? Will the genes predicted only by one program not found in the 'non-overlapping' file? how to get these genes? Thank you Huiquan ???: Carson Holt ????: 2013-04-16 24:01 ???: ??? ;maker-devel at yandell-lab.org ???: Re:Re: [maker-devel] *maker.proteins and*non_overlapping_ab_initio.proteins files 1. by view the gff file produced by maker2, I have found most of the predicted gene loci have est matches. but why only 254 gene annotations got by maker2 ? >> I'd really have to see the results to tell you why. 2. in the ?non-overlapping ab initio?file, I found sequences are all from augustus_masked prediction. Does the non-overlapping file only include the best gene modes from predicted by both augustus and genemark? Does it include genemark- or augustus-specific genes ? >> The 'non-overlapping' file should have the one with best consensus if there >> are 3 or more predictors, and the longest one otherwise. It should be able >> to have augustus and genemark genes. Try it with only genemark and let me >> know if the file is empty. Thanks, Carson _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Apr 18 08:16:18 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 18 Apr 2013 10:16:18 -0400 Subject: [maker-devel] some strange examples of maker annotation In-Reply-To: Message-ID: maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From liuhuiquan at nwsuaf.edu.cn Thu Apr 18 07:37:07 2013 From: liuhuiquan at nwsuaf.edu.cn (=?UTF-8?B?5YiY5oWn5rOJ?=) Date: Thu, 18 Apr 2013 21:37:07 +0800 Subject: [maker-devel] =?utf-8?q?some_strange_examples_of_maker_annotation?= Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: examples of maker annotation.docx Type: application/octet-stream Size: 1037235 bytes Desc: not available URL: From carsonhh at gmail.com Fri Apr 19 08:55:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 19 Apr 2013 10:55:58 -0400 Subject: [maker-devel] FW: some strange examples of maker annotation In-Reply-To: Message-ID: Just forwarding this to the devel list, so it is archived. --Carson From: Carson Holt Date: Thursday, 18 April, 2013 10:16 AM To: ??? , Subject: Re: some strange examples of maker annotation maker seems to prefer to select the snap gene mode, but not genemark > Genemark generally scores lower, and has a very large tendancy to overlap > transposons (it can't handled masked fragments, so has to be run on the > unmasked genome). Looking through the code base, I now see a section where > the non-overlapping model is set to always exclude genemark from the > non-overlapping consensus set if there are masked gene predictors such as snap > or augustus, and to only accept it's models when the evidence supports it. > I'd need to filter genemark candidates for transposon overlap before I could > lift this limitation. > Fig 1. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > The non-overlapping is stranded. These are on different strands. This really > does happens in eukaryotes, so if the evidence supports it we have to allow > it, and if you set keep_preds=1 you can get it just because the gene predictor > supports it reguardless of physical evidence support. > Fig 2. the snap gene mode of non_overlapping_ab_initio is redundant (overlapping) with the maker gene annotation. > > On different strands. Fig 3. there is gene redundancy even within the maker gene annotation > They are on opposite strands. > Fig.4 no evidence support the snap gene mode. augustus and genemark have similar results but different from snap. But the snap gene was selected as non_overlapping_ab_initio > Try using Apollo rather than IGV, it becomes so much more obvious because > apollo separates the strands into separate panels. Thanks, Carson From: ??? Date: Thursday, 18 April, 2013 9:37 AM To: Carson Holt , Subject: some strange examples of maker annotation Hi Carson, I run maker on my genome with ?keep_preds=1? or ?keep_preds=0? respectively. When I manually check the results of maker in Integrative Genomics Viewer (IGV), I found most of the genes annotated by maker were good. But I also view some strange examples for the results. I don?t know how to inteprete these. hope you can give me some suggestions. please see the attached file. thank you very much. best regards, Huiquan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Mon Apr 22 08:09:34 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Mon, 22 Apr 2013 10:09:34 -0400 Subject: [maker-devel] Repeatmasker error? Message-ID: <7EAEB66D-346C-4E9A-B487-B7D5BB352328@hms.harvard.edu> Greetings, Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log files are growing to GB sizes. When looking more carefully, an error seems to be occurring around the Repeatmasker stage: .... Now starting the contig!! -- setting up GFF3 output and fasta chunks doing repeat masking doing blastx repeats doing blastx of proteins doing blastx of proteins doing blastx of proteins doing blastx repeats collecting blastx repeatmasking processing all repeats ERROR: Can't open seq file: /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_current_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/contig_157//theVoid.contig_157/query.masked.gff.seq No such file or directory at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm line 182 Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line 691 Process::MpiChunk::__ANON__() called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 eval {...} called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at /groups/acornworm/opt/maker-2.27-beta/bin/../lib ... I don't seem to have this problem when I fall back to the 2.25b version (though I start having major DBD:SQLite issues). I'm doing this on a cluster, running this under MPI with 50 cores. Any help/suggestions would be appreciated! -Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Apr 22 14:25:06 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 22 Apr 2013 16:25:06 -0400 Subject: [maker-devel] Repeatmasker error? In-Reply-To: Message-ID: Just forwarding this e-mail chain to the devel list for archiving. --Carson From: "Freeman, Robert M." Date: Monday, 22 April, 2013 4:16 PM To: Carson Holt Subject: Re: [maker-devel] Repeatmasker error? Already looks better ... been checking stderr and it looks error-free so far (knock on wood). Thanks for the help, and sorry for the bother! Oh, should I fall back to the 2.27beta release that you announced on the list?? -b On Apr 22, 2013, at 4:09 PM, Carson Holt wrote: > Let me know you still get problems. Redirecting TMP back locally will also > give a big performance boost. > > Thanks, > Carson > > > From: "Freeman, Robert M." > Date: Monday, 22 April, 2013 4:01 PM > To: Carson Holt > Subject: Re: [maker-devel] Repeatmasker error? > > (chuckle) wow, always something new to learn -- been working with IT systems > for > 20 years, and HPC > 8, and no one has ever explained this to me. > > Have directed TMP to /scratch, which also turns out to be an Isilon-related > mount. Will re-direct all this to /tmp to see if this eliminates the problems. > > -b > > On Apr 22, 2013, at 3:43 PM, Carson Holt wrote: > >> The missing file is part of the GFF3 output, the fasta sequence to be >> specific. Sometimes on NFS (network mounted file systems), they can return >> status 'success' even though the IO event really has not succeeded yet (this >> is called asynchronous IO). The result is a certain speed gain but it also >> means that you can write a file, then immediately try and open it, and the >> system will say that it doesn't exist. On some systems you get weird files >> starting with the name '.nfs000' when these types of errors occur. NFS type >> errors are more common when you use many cpus or other jobs on the cluster >> (not just maker) are using a large amount of IO. To avoid this, MAKER tries >> to do as much work as possible in the directory specified by TMP in the >> control files. By default this is /tmp, and if you set it to something else, >> make sure that the location is locally mounted and not NFS mounted (otherwise >> it can't perform it's purpose of bypassing NFS for certain quick read/write >> operations). The newest version of MAKER unloads exonerate and even most >> gene prediction operations into TMP in addition to other steps that were >> already unloaded there in other versions of the pipeline, and I've been able >> to scale up to > 1500 cpus. >> >> Thanks, >> Carson >> >> >> >> From: "Freeman, Robert M." >> Date: Monday, 22 April, 2013 3:24 PM >> To: Carson Holt >> Subject: Re: [maker-devel] Repeatmasker error? >> >> Thanks, Carson. I'll give this a try. >> >> Randomly? Not sure ... I'll have to go back thru the logs to see if this is >> happening consistently or not. Right now, this log is close to 1 GB in size. >> When I saw it getting this large, I stopped the run as I knew errors were >> getting spewed into the log file. >> >> Thought it might be filesystem as well, but unlikely -- the location for the >> MAKER runs is on our Isilon, and these problems appear only with MAKER. >> >> Other files seem to be present... >> >>> % ls -alt >>> drwxrwx--- 3 rmf1 SYSTEMBIO_klab_genome 236 Apr 21 14:50 .. >>> drwxrwx--- 2 rmf1 SYSTEMBIO_klab_genome 48225 Apr 21 12:38 . >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:34 run.log.child.0 >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.final.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 1055922 Apr 21 12:34 >>> contig_157.0.raw.section >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 388512 Apr 21 12:34 evidence_0.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7269 Apr 21 12:34 >>> contig_157.102049-103030.gi%7C145478069%7Cref%7CXP_001425057% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 4561 Apr 21 12:34 >>> contig_157.101950-103090.gi%7C145514179%7Cref%7CXP_001443000% >>> 2E1%7C.p_exonerate >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7088 Apr 21 12:34 >>> contig_157.101956-103435.gi%7C145505343%7Cref%7CXP_001438638% >>> 2E1%7C.p_exonerate >>> .... >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 7184469 Apr 21 12:33 >>> contig_157.0.sequences_r5%2Efasta.blastx >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 0 Apr 21 12:25 >>> query.masked.gff.def >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9885 Apr 21 12:25 >>> query.masked.gff.ann >>> -rw-r--r-- 1 rmf1 SYSTEMBIO_klab_genome 49152 Apr 21 12:25 >>> query.masked.fasta.index >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:25 >>> query.masked.fasta >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 9991 Apr 21 12:25 >>> query.masked.gff >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 28002 Apr 21 12:25 >>> contig_157.0.te_proteins%2Efasta.repeatrunner >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 3589 Apr 21 12:24 >>> contig_157.0.all.rb.out >>> -rwxrwx--- 1 rmf1 SYSTEMBIO_klab_genome 106522 Apr 21 12:23 query.fasta >>> >> It's just that the one output file MAKER is looking for isn't there. >> >> I guess the other question I should ask: as there are exonerate sequences >> there, does it appear that the pipeline is running OK, and just ignore these >> errors (somehow)? >> >> -b >> >> On Apr 22, 2013, at 12:39 PM, Carson Holt wrote: >> >>> Could you give the devel version a try to see if it experiences the same >>> failure, as it's easier to debug off of the most current code. >>> >>> Type this on the command line to download--> >>> ************************** >>> >>> user: ******* >>> password: ******* >>> >>> The error appears to be filesystem related though. Does it appear to happen >>> randomly? >>> >>> Thanks, >>> Carson >>> >>> From: "Freeman, Robert M." >>> Date: Monday, 22 April, 2013 10:09 AM >>> To: "maker-devel at yandell-lab.org" >>> Subject: [maker-devel] Repeatmasker error? >>> >>> Greetings, >>> >>> Am using MAKER 2.27b to annotate a ciliate genome and am finding that my log >>> files are growing to GB sizes. When looking more carefully, an error seems >>> to be occurring around the Repeatmasker stage: >>> >>>> .... >>>> Now starting the contig!! >>>> -- >>>> >>>> setting up GFF3 output and fasta chunks >>>> doing repeat masking >>>> doing blastx repeats >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx of proteins >>>> doing blastx repeats >>>> collecting blastx repeatmasking >>>> processing all repeats >>>> ERROR: Can't open seq file: >>>> /files/.retain-snapshots.d14d-w60d/SysBio/klab_genome/maker/stentor/run_cur >>>> rent_r3/soapPrice1.cycle7.maker.output/soapPrice1.cycle7_datastore/03/EF/co >>>> ntig_157//theVoid.contig_157/query.masked.gff.seq >>>> No such file or directory >>>> >>>> at /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Dumper/GFF/GFFV3.pm >>>> line 182 >>>> Dumper::GFF::GFFV3::finalize('Dumper::GFF::GFFV3=HASH(0x50547f8)') >>>> called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Process/MpiChunk.pm line >>>> 691 >>>> Process::MpiChunk::__ANON__() called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 415 >>>> eval {...} called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib/Error.pm line 407 >>>> Error::subs::try('CODE(0x5b859c0)', 'HASH(0x11ea63f0)') called at >>>> /groups/acornworm/opt/maker-2.27-beta/bin/../lib >>>> ... >>> >>> I don't seem to have this problem when I fall back to the 2.25b version >>> (though I start having major DBD:SQLite issues). >>> >>> I'm doing this on a cluster, running this under MPI with 50 cores. >>> >>> Any help/suggestions would be appreciated! >>> >>> -Bob >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m >>> aker-devel_yandell-lab.org >> >> ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Mon Apr 29 09:58:09 2013 From: ejr at stowers.org (Ross, Eric) Date: Mon, 29 Apr 2013 15:58:09 +0000 Subject: [maker-devel] repeat statistics Message-ID: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From barry.moore at genetics.utah.edu Mon Apr 29 11:59:14 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Mon, 29 Apr 2013 11:59:14 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Mon Apr 29 16:49:12 2013 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 29 Apr 2013 15:49:12 -0700 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the > web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > > Does anyone have a good tool for yanking repeat statistics out of MAKER > gff files? > > SOBA can give some basic stats, but it doesn't play well with my giant > files and I haven't figured out a way to run it locally. > > For that matter does anyone have a script that will calculate SOBA like > stats locally? I'd rather avoid writing one myself if something else is > out there. > > Thanks, > > Eric > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue Apr 30 00:14:44 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 30 Apr 2013 00:14:44 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Correct. And web page is now updated as well. B On Apr 29, 2013, at 4:49 PM, Jason Stajich wrote: > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: > Hi Eric, > > There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: > > http://www.sequenceontology.org/resources/sobacl.html > > Ultimately you'll get it like this: > > svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA > > Then run: > > SOBA/bin/SOBAcl --help > > For a lot of command line examples have a look in: > > SOBA/t/sobacl_test.sh > > B > > On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: > >> Does anyone have a good tool for yanking repeat statistics out of MAKER >> gff files? >> >> SOBA can give some basic stats, but it doesn't play well with my giant >> files and I haven't figured out a way to run it locally. >> >> For that matter does anyone have a script that will calculate SOBA like >> stats locally? I'd rather avoid writing one myself if something else is >> out there. >> >> Thanks, >> >> Eric >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: