From mnuhn at ebi.ac.uk Wed May 1 06:38:52 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Wed, 01 May 2013 12:38:52 +0100 Subject: [maker-devel] substr outside of string Message-ID: <5180FECC.2020308@ebi.ac.uk> Hello! I have run maker with est and rna seq data to create a training set for SNAP. Then I trained SNAP and added the hmm to the snaphmm option and reran maker. Maker is giving me error messages like this: " setting up GFF3 output and fasta chunks doing repeat masking re reading repeat masker report. substr outside of string at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 . --> rank=NA, hostname=ebi-209.ebi.ac.uk " The line from which this error message originates is: substr($$seq, $b -1 , $l, "$replace"x$l); After getting these error messages I replaced it with eval { substr($$seq, $b -1 , $l, "$replace"x$l); }; if ($@) { use Carp; use Data::Dumper; confess( $@ . "\n\n" . Dumper($p) . "\n\n" . "Length of sequence: " . (length $$seq) ); } After that I got this: $VAR1 = [ 98926, 99033 ]; Length of sequence: 98686 at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 5 I have not changed the genome file. I'm also concerned with the reported length of 98686, because I have a list of all sequences in the file and their lengths, and none of them has a length of 98686 bp. The sequences with the closest lengths are these: 98367 LSalAtl2s1200 98438 LSalAtl2s1473 98776 LSalAtl2s1613 98876 LSalAtl2s1199 so they are not even close. $$seq is a sequence as a string, when I print it. Sometimes maker prints a message like this: " --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s63 Length: 3997709 Tries: 5!! #--------------------------------------------------------------------- " But according to my list, which I generated from the exact same file that maker has in genome_file option, the length of that sequence is 1169407. Any idea, why I am getting these problems and what to do about them? Cheers, Michael. From carsonhh at gmail.com Wed May 1 08:17:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 01 May 2013 09:17:50 -0400 Subject: [maker-devel] substr outside of string In-Reply-To: <5180FECC.2020308@ebi.ac.uk> Message-ID: The length you are printing is not the length of the contig, but rather the length of the piece of the contig MAKER is working with at that moment. The fact that the length is not exactly 100000 is telling me that this is a piece at the end of the contig. By any chance are you using GFF3 pass-through of repeat elements? If not there may be a repeatmasker parsing bug as the start and end coordinate are off the edge of the contig. If you run maker on the command line (not vie MPI), what is the repeatmasker report read immediately before the error. Could you then attach it and the fasta sequence for the contig that fails. Thanks, Carson On 13-05-01 7:38 AM, "Michael Nuhn" wrote: >Hello! > >I have run maker with est and rna seq data to create a training set for >SNAP. Then I trained SNAP and added the hmm to the snaphmm option and >reran maker. > >Maker is giving me error messages like this: > >" >setting up GFF3 output and fasta chunks >doing repeat masking >re reading repeat masker report. > >substr outside of string at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 >. >--> rank=NA, hostname=ebi-209.ebi.ac.uk >" > >The line from which this error message originates is: > > substr($$seq, $b -1 , $l, "$replace"x$l); > >After getting these error messages I replaced it with > > eval { > substr($$seq, $b -1 , $l, "$replace"x$l); > }; > if ($@) { > use Carp; > use Data::Dumper; > confess( > $@ > . "\n\n" > . Dumper($p) > . "\n\n" > . "Length of sequence: " . (length $$seq) > ); > } > >After that I got this: > >$VAR1 = [ > 98926, > 99033 > ]; > > >Length of sequence: 98686 at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 >5 > >I have not changed the genome file. > >I'm also concerned with the reported length of 98686, because I have a >list of all sequences in the file and their lengths, and none of them >has a length of 98686 bp. The sequences with the closest lengths are >these: > >98367 LSalAtl2s1200 >98438 LSalAtl2s1473 >98776 LSalAtl2s1613 >98876 LSalAtl2s1199 > >so they are not even close. > >$$seq is a sequence as a string, when I print it. > >Sometimes maker prints a message like this: > >" >--Next Contig-- > >Processing run.log file... >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s63 >Length: 3997709 >Tries: 5!! >#--------------------------------------------------------------------- >" > >But according to my list, which I generated from the exact same file >that maker has in genome_file option, the length of that sequence is >1169407. > >Any idea, why I am getting these problems and what to do about them? > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ejr at stowers.org Wed May 1 10:57:11 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 15:57:11 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Wed May 1 18:42:47 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 1 May 2013 17:42:47 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > Should this be accessible anonymously? > > I'm unable to connect. > > Eric > > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM > To: Barry Moore > Cc: Eric Ross , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >> Hi Eric, >> >> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >> >> http://www.sequenceontology.org/resources/sobacl.html >> >> Ultimately you'll get it like this: >> >> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >> >> Then run: >> >> SOBA/bin/SOBAcl --help >> >> For a lot of command line examples have a look in: >> >> SOBA/t/sobacl_test.sh >> >> B >> >> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >> >>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>> gff files? >>> >>> SOBA can give some basic stats, but it doesn't play well with my giant >>> files and I haven't figured out a way to run it locally. >>> >>> For that matter does anyone have a script that will calculate SOBA like >>> stats locally? I'd rather avoid writing one myself if something else is >>> out there. >>> >>> Thanks, >>> >>> Eric >>> >>> -- >>> Eric Ross >>> Bioinformatic Specialist I >>> Alejandro S?nchez Alvarado Laboratory >>> Stowers Institute for Medical Research >>> Howard Hughes Medical Institute >>> ejr at stowers.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Wed May 1 18:53:08 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 23:53:08 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Works now. Thanks much, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM To: Eric Ross > Cc: Jason Stajich >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunfei1989 at gmail.com Fri May 3 11:33:42 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 3 May 2013 09:33:42 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped Message-ID: Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri May 3 17:51:27 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 3 May 2013 16:51:27 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: <37BA6893-1175-4F3E-B3AA-6C1E23C4364E@genetics.utah.edu> Let me know how it works out for you - feedback either positive or negative is useful. B On May 1, 2013, at 5:53 PM, Ross, Eric wrote: > Works now. > > Thanks much, > > Eric > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM > To: Eric Ross > Cc: Jason Stajich , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Eric, > > Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. > > B > > On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > >> Should this be accessible anonymously? >> >> I'm unable to connect. >> >> Eric >> >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> From: Jason Stajich >> Date: Monday, April 29, 2013 5:49 PM >> To: Barry Moore >> Cc: Eric Ross , "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] repeat statistics >> >> Barry - I think you mean topaz instead of malachite? >> >> svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA >> >> >> Jason Stajich >> jason at bioperl.org >> jason.stajich at gmail.com >> http://bioperl.org/wiki/User:Jason >> http://twitter.com/hyphaltip >> >> >> On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >>> Hi Eric, >>> >>> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >>> >>> http://www.sequenceontology.org/resources/sobacl.html >>> >>> Ultimately you'll get it like this: >>> >>> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >>> >>> Then run: >>> >>> SOBA/bin/SOBAcl --help >>> >>> For a lot of command line examples have a look in: >>> >>> SOBA/t/sobacl_test.sh >>> >>> B >>> >>> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >>> >>>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>>> gff files? >>>> >>>> SOBA can give some basic stats, but it doesn't play well with my giant >>>> files and I haven't figured out a way to run it locally. >>>> >>>> For that matter does anyone have a script that will calculate SOBA like >>>> stats locally? I'd rather avoid writing one myself if something else is >>>> out there. >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> -- >>>> Eric Ross >>>> Bioinformatic Specialist I >>>> Alejandro S?nchez Alvarado Laboratory >>>> Stowers Institute for Medical Research >>>> Howard Hughes Medical Institute >>>> ejr at stowers.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmdoyle at purdue.edu Sun May 5 06:55:47 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Sun, 5 May 2013 07:55:47 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1109250054.216072.1367754420354.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I've recently attempted to install MAKER (Mac OS X). ?I installed blast and exonerate using the ./Build blast and ./Build exonerate commands, and I manually installed repeatmasker, snap and augustus (I couldn't get the ./Build commands to work). ?I then attempted to test out maker following the 2012 MAKER tutorial. ?I received the blastx error message pasted below, and there is additional information in the maker log I've attached to this email. ?I was wondering if anyone had any suggestions about debugging, as I'm not quite sure where to begin... Best wishes and thanks, Jackie #--------- command -------------# Widget::formater: /usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in /tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 #-------------------------------# dyld: lazy symbol binding failed: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace dyld: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in Widget::formater FATAL ERROR ERROR: Failed while doing blastx repeats!! ERROR: Chunk failed at level 3 !! FAILED CONTIG:contig-dpp-500-500 Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: Build status.odt Type: application/vnd.oasis.opendocument.text Size: 2740 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.odt Type: application/vnd.oasis.opendocument.text Size: 2772 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.odt Type: application/vnd.oasis.opendocument.text Size: 3479 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.odt Type: application/vnd.oasis.opendocument.text Size: 2821 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker log.odt Type: application/vnd.oasis.opendocument.text Size: 3340 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 6 07:32:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 08:32:52 -0400 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: Message-ID: You would have to send me the captured STDERR. MAKER will print out a number of messages whenever it restarts a contig, and will explain why it deletes any files before restarting. Thanks, Carson From: Yunfei Guo Date: Friday, 3 May, 2013 12:33 PM To: Subject: [maker-devel] maker doesn't pick up where it stopped Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 6 09:02:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 10:02:52 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Most maker development and debugging actually happens on a Mac (OS X 10.7.5). Blast, Augustus, SNAP all install for me just fine with maker 2.27. What errors do you get during installation? Do you by any chance have non-standard libraries via Mac ports for example. Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). I then attempted to test out maker following >the 2012 MAKER tutorial. I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From guoyunfei1989 at gmail.com Mon May 6 10:33:57 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Mon, 6 May 2013 08:33:57 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: References: Message-ID: Hi Carson, I used quitet mode, here's stderr (I only show one 'now starting the contig' msg). When I check maker master log upon restart by 'grep -ic finished master_log', all 'finished' tags were gone. A data structure will be created for you at: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _datastore To access files for individual sequences use the datastore index: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _master_datastore_index.log #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold105 Length: 8761 #--------------------------------------------------------------------- ... MAKER WARNING: The file GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/C8/27/scaffold5690//theVoid.scaffold5690/scaffold5690.0.HumanUCSCProteins%2Efasta.blastx did not finish on the last run and must be erased ... ERROR: Could not open '/home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/A4/F7/scaffold6034//theVoid.scaffold6034/scaffold6034.0.Srub%2Elib.specific.out' ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold6034 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold6034 ... On Mon, May 6, 2013 at 5:32 AM, Carson Holt wrote: > You would have to send me the captured STDERR. MAKER will print out a > number of messages whenever it restarts a contig, and will explain why it > deletes any files before restarting. > > Thanks, > Carson > > > From: Yunfei Guo > Date: Friday, 3 May, 2013 12:33 PM > To: > Subject: [maker-devel] maker doesn't pick up where it stopped > > Dear MAKER community, > > I got a problem that maker doesn't pick up where it stopped last time, > rather, it will discard all previous results. > > command: > echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 > maker version: > 2.26 > mpich version: > 1.5rc3 > in maker_opts: > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > > It never happened before. Any advice? > > Thank you! > > Yunfei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon May 6 21:22:23 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 7 May 2013 02:22:23 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> Message-ID: Repeats can still happen in genes. So an outright block actually causes more errors than it avoids, and a mixed approach of hard and soft masking becomes more appropriate. The masking step stops alignments from seeding in repeat regions, but if alignments seed in non-repeat regions then they can still extend through repeat regions during polishing steps (I.e. The EST evidence supports extension through the repeat and inclusion of the TE). --Carson From: Dario Copetti > Organization: AGI Date: Monday, 6 May, 2013 5:19 PM To: > Cc: "kapeel at cals.arizona.edu" >, "Stein, Joshua" >, Rod Wing > Subject: gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcopetti at cals.arizona.edu Mon May 6 16:19:42 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Mon, 06 May 2013 14:19:42 -0700 Subject: [maker-devel] gene models overlapping with TEs Message-ID: <51881E6E.9010202@cals.arizona.edu> Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gene_TE.jpg Type: image/jpeg Size: 177299 bytes Desc: not available URL: From myandell at genetics.utah.edu Mon May 6 22:47:49 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:47:49 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02CEE@mxb2.hg.genetics.utah.edu> could the TEs be in the UTRs? Also, maybe some of these are low complexity regions? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From myandell at genetics.utah.edu Mon May 6 22:49:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:49:51 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> humm, eballing then it doesn't look lie its the UTRss.. Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From carsonhh at gmail.com Tue May 7 06:39:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 07:39:17 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> Message-ID: If I had to guess. I imagine the EST evidence includes assembled mRNA-seq reads? Is that correct? --Carson On 13-05-06 11:49 PM, "Mark Yandell" wrote: >humm, eballing then it doesn't look lie its the UTRss.. > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: maker-devel-bounces at yandell-lab.org >[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >[dcopetti at cals.arizona.edu] >Sent: Monday, May 06, 2013 3:19 PM >To: maker-devel at yandell-lab.org >Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >Subject: [maker-devel] gene models overlapping with TEs > >Carson, > >Analyzing the output of a MAKER run on a rice-sized genome I noticed that >some gene models (~10%) overlap with TE coding regions. As a QC step, I >used BEDtools to determine the intersection of "CDS" and "repeatmasker" >or "repeatrunner" and some 2400 genes overlap for at least 30% of their >respective length. I am wondering how the gene models still appear in the >final output, since I thought that the masking step was giving us the >absoulte confirmation that in our endogenous gene list we do not include >TE coding regions. Here below an example of a gene (attached picture too): > >ObracChr10 maker mRNA 355,056 358,075 . - . >ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >ObracChr10 maker exon 355,056 356,874 . - . >ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >ObracChr10 maker exon 356,965 357,081 . - . >ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,209 357,319 . - . >ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,756 358,075 . - . >ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,756 358,075 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,209 357,319 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 356,965 357,081 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 355,056 356,874 . - 0 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 > > > > > > > > > > > > > > > > > > > > >ObracChr10 repeatrunner match_part 357,755 358,084 566 - > . >ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner protein_match 357,755 358,084 566 - > . >ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner match_part 357,202 357,294 142 - > . >ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner protein_match 357,202 357,294 142 - > . >ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner match_part 355,059 357,092 3367 - > . >ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - > . >ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 > > >This result is valid both for output lines from repeatmasker or >repeatrunner, and the gene models come from either FGENESH or SNAP >predictions. >How can I explain this problem? >Thanks, > >Dario > > > > > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jmdoyle at purdue.edu Tue May 7 10:12:38 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 11:12:38 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, Thanks for the quick reply! ?I don't remember any errors during Blast installation, it appeared to install fine with the ./Build command. ?Augustus, Repeatmasker and SNAP were the programs I could not install with the ./Build commands, and instead installed manually. ?I've attached the error messages I received when I tried to use the ./Build commands. ?I've tested out the three programs I installed manually and they seem to work fine on their own. I do have xcode installed. ?How would I determine if I have "non-standard libraries via Mac ports"? Thanks again for your help with this. Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: "Carson Holt" To: "Jacqueline R M Doyle" , maker-devel at yandell-lab.org Sent: Monday, May 6, 2013 10:02:52 AM Subject: Re: [maker-devel] MAKER installation debugging Most maker development and debugging actually happens on a Mac (OS X 10.7.5). ?Blast, Augustus, SNAP all install for me just fine with maker 2.27. ?What errors do you get during installation? ?Do you by any chance have non-standard libraries via Mac ports for example. ?Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). ?I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). ?I then attempted to test out maker following >the 2012 MAKER tutorial. ?I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. ?I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: repeatmasker installation error.rtf Type: application/rtf Size: 1264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: snap installation error.rtf Type: application/rtf Size: 1095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: agustus installation error.rtf Type: application/rtf Size: 1124 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 7 10:19:57 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 11:19:57 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From jmdoyle at purdue.edu Tue May 7 11:54:22 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 12:54:22 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <963584633.220449.1367945662870.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, I am using MAKER 2.10 (I downloaded it so long ago I'd forgotten there were two options). Would it be better for me to start over with 2.27? Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: Carson Holt To: Jacqueline R M Doyle Cc: maker-devel at yandell-lab.org Sent: Tue, 07 May 2013 11:19:57 -0400 (EDT) Subject: Re: [maker-devel] MAKER installation debugging Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue May 7 13:20:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 14:20:19 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51892ABA.2060100@cals.arizona.edu> Message-ID: This is really more of an evidence issue. Because you have assembled mRNAseq evidence, you are probably getting them improperly included in the assembled EST, so MAKER just follows the evidence. It tries to mask it out, but the alignment of the longer EST heavily supports the repeats inclusion in the model during alignment polishing. Solutions: 1. You can set softmask=0 instead of softmask=1 (1 is the default), to make everything hard masked instead (it will be a hard 'N' so no alignment can happen). 2. You can pre-mask the genome. Easiest way to do this would be to collect the query.masked.fasta files inside each theVoid directory in the datastore and use them as the input. Then none of the polishing steps can ever extend the alignment. 3. You can filter the mRNA-seq data fro TE elements before assembly. Thanks, Carson On 13-05-07 12:24 PM, "Dario Copetti" wrote: >Yes, there was RNA-seq evidence as well. Still I would like to have this >evidence annotated as TE, and not as a gene (or at least to have it >tagged in some way). > >As you suggested, a good solution could be to sequentially soft mask >with the RMasker output and then hard mask with the RRunner result. In >this way we hide TE coding regions from all predictors and alignments, >leaving all the other types of repeats softmasked. This meets Mark's >target of having MITEs and other non-autonomous TEs (as well as >simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my >opinion, this case could be one of the few cases (or the only one?) >where gene and repeat annotation can overlap. > >For our genomes I will have a list of these genes overlapping TE coding >regions, and we will likely remove them. Please let us know how you >intend to fix this problem and on which MAKER version it will appear. >Thanks for the assistance and suggestions, > >Dario > > > >On 05/07/2013 04:39 AM, Carson Holt wrote: >> If I had to guess. I imagine the EST evidence includes assembled >>mRNA-seq >> reads? Is that correct? >> >> --Carson >> >> >> >> On 13-05-06 11:49 PM, "Mark Yandell" wrote: >> >>> humm, eballing then it doesn't look lie its the UTRss.. >>> >>> Mark Yandell >>> Professor of Human Genetics >>> H.A. & Edna Benning Presidential Endowed Chair >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ph:801-587-7707 >>> >>> ________________________________________ >>> From: maker-devel-bounces at yandell-lab.org >>> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >>> [dcopetti at cals.arizona.edu] >>> Sent: Monday, May 06, 2013 3:19 PM >>> To: maker-devel at yandell-lab.org >>> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >>> Subject: [maker-devel] gene models overlapping with TEs >>> >>> Carson, >>> >>> Analyzing the output of a MAKER run on a rice-sized genome I noticed >>>that >>> some gene models (~10%) overlap with TE coding regions. As a QC step, I >>> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >>> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >>> respective length. I am wondering how the gene models still appear in >>>the >>> final output, since I thought that the masking step was giving us the >>> absoulte confirmation that in our endogenous gene list we do not >>>include >>> TE coding regions. Here below an example of a gene (attached picture >>>too): >>> >>> ObracChr10 maker mRNA 355,056 358,075 . - . >>> >>>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_ >>>eA >>> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >>> ObracChr10 maker exon 355,056 356,874 . - . >>> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 356,965 357,081 . - . >>> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,209 357,319 . - . >>> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,756 358,075 . - . >>> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,756 358,075 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,209 357,319 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 356,965 357,081 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 355,056 356,874 . - 0 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ObracChr10 repeatrunner match_part 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner protein_match 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner match_part 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner protein_match 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner match_part 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> >>> >>> This result is valid both for output lines from repeatmasker or >>> repeatrunner, and the gene models come from either FGENESH or SNAP >>> predictions. >>> How can I explain this problem? >>> Thanks, >>> >>> Dario >>> >>> >>> >>> >>> >>> -- >>> Dario Copetti, PhD >>> Research Associate >>> Arizona Genomics Institute >>> University of Arizona - BIO5 >>> >>> 1657 E. Helen St. >>> Tucson, AZ 85721 >>> www.genome.arizona.edu >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > From dcopetti at cals.arizona.edu Tue May 7 11:24:26 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Tue, 07 May 2013 09:24:26 -0700 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: References: Message-ID: <51892ABA.2060100@cals.arizona.edu> Yes, there was RNA-seq evidence as well. Still I would like to have this evidence annotated as TE, and not as a gene (or at least to have it tagged in some way). As you suggested, a good solution could be to sequentially soft mask with the RMasker output and then hard mask with the RRunner result. In this way we hide TE coding regions from all predictors and alignments, leaving all the other types of repeats softmasked. This meets Mark's target of having MITEs and other non-autonomous TEs (as well as simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my opinion, this case could be one of the few cases (or the only one?) where gene and repeat annotation can overlap. For our genomes I will have a list of these genes overlapping TE coding regions, and we will likely remove them. Please let us know how you intend to fix this problem and on which MAKER version it will appear. Thanks for the assistance and suggestions, Dario On 05/07/2013 04:39 AM, Carson Holt wrote: > If I had to guess. I imagine the EST evidence includes assembled mRNA-seq > reads? Is that correct? > > --Carson > > > > On 13-05-06 11:49 PM, "Mark Yandell" wrote: > >> humm, eballing then it doesn't look lie its the UTRss.. >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel-bounces at yandell-lab.org >> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >> [dcopetti at cals.arizona.edu] >> Sent: Monday, May 06, 2013 3:19 PM >> To: maker-devel at yandell-lab.org >> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >> Subject: [maker-devel] gene models overlapping with TEs >> >> Carson, >> >> Analyzing the output of a MAKER run on a rice-sized genome I noticed that >> some gene models (~10%) overlap with TE coding regions. As a QC step, I >> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >> respective length. I am wondering how the gene models still appear in the >> final output, since I thought that the masking step was giving us the >> absoulte confirmation that in our endogenous gene list we do not include >> TE coding regions. Here below an example of a gene (attached picture too): >> >> ObracChr10 maker mRNA 355,056 358,075 . - . >> ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >> ObracChr10 maker exon 355,056 356,874 . - . >> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 356,965 357,081 . - . >> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,209 357,319 . - . >> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,756 358,075 . - . >> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,756 358,075 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,209 357,319 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 356,965 357,081 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 355,056 356,874 . - 0 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ObracChr10 repeatrunner match_part 357,755 358,084 566 - >> . >> ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner protein_match 357,755 358,084 566 - >> . >> ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner match_part 357,202 357,294 142 - >> . >> ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner protein_match 357,202 357,294 142 - >> . >> ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner match_part 355,059 357,092 3367 - >> . >> ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - >> . >> ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> >> >> This result is valid both for output lines from repeatmasker or >> repeatrunner, and the gene models come from either FGENESH or SNAP >> predictions. >> How can I explain this problem? >> Thanks, >> >> Dario >> >> >> >> >> >> -- >> Dario Copetti, PhD >> Research Associate >> Arizona Genomics Institute >> University of Arizona - BIO5 >> >> 1657 E. Helen St. >> Tucson, AZ 85721 >> www.genome.arizona.edu >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From jmdoyle at purdue.edu Tue May 7 23:09:48 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Wed, 8 May 2013 00:09:48 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> I downloaded MAKER 2.27 and it installed perfectly! I worked through the tutorial without any problems. Thanks for your help with this! Best wishes, Jackie From carsonhh at gmail.com Tue May 7 23:10:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 08 May 2013 00:10:33 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: I'm glad it worked. --Carson On 13-05-08 12:09 AM, "Jacqueline R M Doyle" wrote: >I downloaded MAKER 2.27 and it installed perfectly! I worked through the >tutorial without any problems. Thanks for your help with this! > >Best wishes, Jackie > From Carson.Holt at oicr.on.ca Wed May 8 14:25:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 8 May 2013 19:25:52 +0000 Subject: [maker-devel] Non-standard genetic code In-Reply-To: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Message-ID: It's not possible yet. It is one of the things we have on a list to do. It's not a small task either to make the necessary changes to the code, as the codon usage affects blastx alignments, exonerate protein2genome alignments, ab initio gene prediction, gene extension/boundary polishing, and UTR addition. So the changes have to go into many many locations. Thanks, Carson On 13-05-08 11:44 AM, "Daniil Alexeyevsky" wrote: >Hi, > >I want to use MAKER to annotate organism with non-standard genetic >code. (It only has UGA stop-codon, UAA and UAG code glutamine). > >Is it possible to use MAKER in this case? If I am bound to editing some >source codes, could you please point me where to look? > >With best regards, >-- Daniil > From mnuhn at ebi.ac.uk Fri May 10 07:10:35 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 13:10:35 +0100 Subject: [maker-devel] Duplicated exons Message-ID: <518CE3BB.3060003@ebi.ac.uk> Hello Carson! I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 and then about four hundred lines later there are these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 which are identical except for the order number after "exon:". This seems to have happened to a lot of features in that file. How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? Cheers, Michael. From daa at fbb.msu.ru Wed May 8 10:44:50 2013 From: daa at fbb.msu.ru (Daniil Alexeyevsky) Date: Wed, 08 May 2013 19:44:50 +0400 Subject: [maker-devel] Non-standard genetic code Message-ID: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Hi, I want to use MAKER to annotate organism with non-standard genetic code. (It only has UGA stop-codon, UAA and UAG code glutamine). Is it possible to use MAKER in this case? If I am bound to editing some source codes, could you please point me where to look? With best regards, -- Daniil From diana_leduc at eva.mpg.de Fri May 10 09:44:50 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 16:44:50 +0200 (CEST) Subject: [maker-devel] Maker consensus Message-ID: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:13:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:13:33 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: I'm sorry I don?t' understand question 1. You are you missing resulting fasta files, correct? Did your resulting GFF3 file have any features of type "gene"? Did you run fasta_merge after running gff3_merge? Could you give me more details on what you are trying to do, so I can take a stab at question 2 as well. Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 10:44 AM To: Cc: Gabriel Renaud , Janet Kelso , Torsten Schoeneberg Subject: [maker-devel] Maker consensus Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:25:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:25:17 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518CE3BB.3060003@ebi.ac.uk> Message-ID: Very odd. Which version of MAEKR are you using. Are you using GFF3 passthrough in the run that generates the duplication? Thanks, Carson On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >Hello Carson! > >I have been trying to get to the bottom of an error message when >(re)training snap. Snap, or more precisely fathom, was giving me unclear >error messages about misordered and overlapping exons. > >I have looked into the gff files from which these exons originate and >noticed that a lot of exons in that file were duplicated. For example I >have found these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >and then about four hundred lines later there are these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >which are identical except for the order number after "exon:". > >This seems to have happened to a lot of features in that file. > >How can I avoid this? Or if this is just a rare problem, can I have >maker recompute the gff file without redoing all the computations again? > >Cheers, >Michael. > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.zohren at qmul.ac.uk Fri May 10 12:07:30 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Fri, 10 May 2013 18:07:30 +0100 Subject: [maker-devel] annotation of birch genome Message-ID: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using "mpiexec -n 20 maker") I've also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It's been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Fri May 10 12:35:37 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 18:35:37 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D2FE9.6080900@ebi.ac.uk> On 05/10/2013 05:25 PM, Carson Holt wrote: > Very odd. Which version of MAEKR are you using. Are you using GFF3 > passthrough in the run that generates the duplication? I am using version 2.27 of maker. I am not using the passthrough option. Cheers, Michael. > Thanks, > Carson > > > On 13-05-10 8:10 AM, "Michael Nuhn" wrote: > >> Hello Carson! >> >> I have been trying to get to the bottom of an error message when >> (re)training snap. Snap, or more precisely fathom, was giving me unclear >> error messages about misordered and overlapping exons. >> >> I have looked into the gff files from which these exons originate and >> noticed that a lot of exons in that file were duplicated. For example I >> have found these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> and then about four hundred lines later there are these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> which are identical except for the order number after "exon:". >> >> This seems to have happened to a lot of features in that file. >> >> How can I avoid this? Or if this is just a rare problem, can I have >> maker recompute the gff file without redoing all the computations again? >> >> Cheers, >> Michael. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri May 10 12:25:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:25:15 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Message-ID: Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 12:44:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:44:01 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: Message-ID: Also, if you will be annotating more genomes, you should look into getting allocation on your university's cluster. Queen Mary University has a 2000 cpu cluster. Most cluster managers bend over backwards to help Biologists use their systems as it looks good on progress reports and funding requests as they can show they have a broader user base (i.e. departments other than physics :-) --Carson From: Carson Holt Date: Friday, 10 May, 2013 1:25 PM To: Jasmin Zohren , Subject: Re: [maker-devel] annotation of birch genome Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 12:41:55 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 19:41:55 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > >, Janet Kelso < kelso at eva.mpg.de > >, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de > > > Subject: [maker-devel] Maker consensus > > > Dear maker developers, > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > I now have two questions: > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > > Thank you in advance. > > Best regards, > > Diana Le Duc > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 12:51:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:51:48 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Ok. You just ran the evidence and didn't give a gene predictor. You need to provide an HMM file for SNAP a species for augustus, or for rough annotations you can set protein3genome=1 and est2genome=1. This will try and generate models direct from the alignments. If you provide a gene predictor, then MAKER can talk to it about the evidence alignments so it can make a best gene call for the region. Then there will be gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and transcripts.fasta. If you need to train a predictor, you can train SNAP using the maker2zff script and the SNAP documentation or maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an excellent explanation as well as tools in a previous list message. list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html Script is in this github repo - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 1:41 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAM L1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG000 00000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00 000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org> > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < > kelso at eva.mpg.de>, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de> > Subject: [maker-devel] Maker consensus > > > > > > > > Dear maker developers, > > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > > I now have two questions: > > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 13:08:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:08:35 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D2FE9.6080900@ebi.ac.uk> Message-ID: 2.27 from the website download or the SVN devel version? Thanks, Carson On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >On 05/10/2013 05:25 PM, Carson Holt wrote: >> Very odd. Which version of MAEKR are you using. Are you using GFF3 >> passthrough in the run that generates the duplication? > >I am using version 2.27 of maker. I am not using the passthrough option. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >> >>> Hello Carson! >>> >>> I have been trying to get to the bottom of an error message when >>> (re)training snap. Snap, or more precisely fathom, was giving me >>>unclear >>> error messages about misordered and overlapping exons. >>> >>> I have looked into the gff files from which these exons originate and >>> noticed that a lot of exons in that file were duplicated. For example I >>> have found these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> and then about four hundred lines later there are these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> which are identical except for the order number after "exon:". >>> >>> This seems to have happened to a lot of features in that file. >>> >>> How can I avoid this? Or if this is just a rare problem, can I have >>> maker recompute the gff file without redoing all the computations >>>again? >>> >>> Cheers, >>> Michael. >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Fri May 10 13:29:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:29:32 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: You can use any species augustus already has. If it doesn't then you train it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH environmental variable, and is usually ?/augusts/config/species Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 2:16 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > > > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2au > gustus_gbk.pl > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1 > -2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000 > 000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT000000 > 00219|DSCAML1-2039 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> I'm sorry I don?t' understand question 1. You are you missing resulting >> fasta files, correct? Did your resulting GFF3 file have any features of type >> "gene"? Did you run fasta_merge after running gff3_merge? >> >> >> >> Could you give me more details on what you are trying to do, so I can take a >> stab at question 2 as well. >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 10:44 AM >> To: < maker-devel at yandell-lab.org> >> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de>, Torsten Schoeneberg < >> torsten.schoeneberg at medizin.uni-leipzig.de> >> Subject: [maker-devel] Maker consensus >> >> >> >> >> >> >> >> Dear maker developers, >> >> >> I am a phD student working on de novo assembly and annotation of a bird >> genome. I used Maker as annotation pipeline, which ran very well, and I >> obtained different annotations with evidence from Augustus gene predictor, >> small EST dataset from my organism and protein sequences from chicken, turkey >> and zebrafinch. I could combine the different gff files from different >> scaffolds into one gff file with annotations for the entire genome. >> >> >> I now have two questions: >> >> >> 1. What could be the reason that I haven't gotten the protein.fasta and >> trancript.fasta files >> >> >> 2. How can I obtain a consensus gene list of different evidences from maker? >> What I would actually need is the scaffold, coordinates and annotation (gene >> name) according to the 3 other bird species. >> Thank you in advance. >> >> >> >> Best regards, >> >> >> >> Diana Le Duc >> >> >> >> -- >> >> Max Planck Institute for Evolutionary Anthropology >> Department of Evolutionary Genetics >> Deutscher Platz 6 >> D-04103 Leipzig >> >> Phone +49 (0)341-3550-554 >> www.eva.mpg.de >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Fri May 10 19:29:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Sat, 11 May 2013 01:29:10 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D90D6.4080603@ebi.ac.uk> On 05/10/2013 07:08 PM, Carson Holt wrote: > 2.27 from the website download or the SVN devel version? SVN. I checked it out on 19/03/2013. > Thanks, > Carson > > > On 13-05-10 1:35 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 05:25 PM, Carson Holt wrote: >>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>> passthrough in the run that generates the duplication? >> >> I am using version 2.27 of maker. I am not using the passthrough option. >> >> Cheers, >> Michael. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>> >>>> Hello Carson! >>>> >>>> I have been trying to get to the bottom of an error message when >>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>> unclear >>>> error messages about misordered and overlapping exons. >>>> >>>> I have looked into the gff files from which these exons originate and >>>> noticed that a lot of exons in that file were duplicated. For example I >>>> have found these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> and then about four hundred lines later there are these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> which are identical except for the order number after "exon:". >>>> >>>> This seems to have happened to a lot of features in that file. >>>> >>>> How can I avoid this? Or if this is just a rare problem, can I have >>>> maker recompute the gff file without redoing all the computations >>>> again? >>>> >>>> Cheers, >>>> Michael. >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From dsth at ebi.ac.uk Fri May 10 19:20:42 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Sat, 11 May 2013 01:20:42 +0100 Subject: [maker-devel] Duplicated exons Message-ID: That is odd. i've run that version of maker 30-40x at ebi lately and never seen it. Is it just one scaffold? While i'd be surprised if it's the cause but have you been playing with the file locking options Carson mentioned a while back? I'd definitely be inclined to re-process it if it's just the one scaffold. Dan On May 10, 2013 12:45 PM, "Michael Nuhn" wrote: > > Hello Carson! > > I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. > > I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > and then about four hundred lines later there are these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > which are identical except for the order number after "exon:". > > This seems to have happened to a lot of features in that file. > > How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? > > Cheers, > Michael. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 13:16:34 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 20:16:34 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > Thank you for the quick answer. > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > 172 218;Gap=M47; > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > I hope this explains better what we are after. > > Thank you once again. > > Best regards, > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > I'm sorry I don?t' understand question 1. You are you missing > > > resulting fasta files, correct? Did your resulting GFF3 file have any > > > features of type "gene"? Did you run fasta_merge after running > > > gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take > > a stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 10:44 AM > > To: < maker-devel at yandell-lab.org > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > >, Janet Kelso < kelso at eva.mpg.de > > >, Torsten Schoeneberg < > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > Subject: [maker-devel] Maker consensus > > > > > > Dear maker developers, > > > > I am a phD student working on de novo assembly and annotation of a bird > > genome. I used Maker as annotation pipeline, which ran very well, and I > > obtained different annotations with evidence from Augustus gene predictor, > > small EST dataset from my organism and protein sequences from chicken, > > turkey and zebrafinch. I could combine the different gff files from > > different scaffolds into one gff file with annotations for the entire > > genome. > > > > I now have two questions: > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > trancript.fasta files > > > > 2. How can I obtain a consensus gene list of different evidences from > > maker? What I would actually need is the scaffold, coordinates and > > annotation (gene name) according to the 3 other bird species. > > > > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > > > Max Planck Institute for Evolutionary Anthropology > > Department of Evolutionary Genetics > > Deutscher Platz 6 > > D-04103 Leipzig > > > > Phone +49 (0)341-3550-554 > > www.eva.mpg.de > > _______________________________________________ maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From linzkl007 at hotmail.com Sat May 11 12:28:47 2013 From: linzkl007 at hotmail.com (=?gb2312?B?7OTs5A==?=) Date: Sun, 12 May 2013 01:28:47 +0800 Subject: [maker-devel] about predictor training Message-ID: Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sun May 12 22:53:34 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Mon, 13 May 2013 12:53:34 +0900 Subject: [maker-devel] exon numbering bug? Message-ID: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 12 23:01:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 00:01:41 -0400 Subject: [maker-devel] exon numbering bug? In-Reply-To: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Message-ID: There has been some post processing of the GFF3. It is not an original MAKER result file. I can tell based on the ID's (maker doesn't assign numerical IDs). Most likely it was processed to make exons unique without having dual parentage. Normally if the same exon is found in two transcripts it will have two parents separated by a comma. I imaging that the post processing script duplicated the exon, creating independent IDs and split the parents, but left the Name= tag the same. Since the Name= tag was based off of the first transcript the exon belonged to, it stayed the same. --Carson From: "Kang, Yang Jae" Date: Sunday, 12 May, 2013 11:53 PM To: Subject: [maker-devel] exon numbering bug? Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 09:00:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:00:01 -0400 Subject: [maker-devel] about predictor training In-Reply-To: Message-ID: You need to convert the GTF files to GFF3. There is a tophat2gff and cufflinks2gff script that come with MAKER. I recommend only using cufflinks results and ignoring tophat results though as they tend to be a lot more spurious. Jason Stajich wrote an excellent explanation on training Augustus on the list previously - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included scripts to assist with the training - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Overall the strategy is similar to the one used to train SNAP. Thanks, Carson From: ?? Date: Saturday, 11 May, 2013 1:28 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] about predictor training Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 09:01:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:01:58 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D90D6.4080603@ebi.ac.uk> Message-ID: Could you send me your maker opts files, the contig that fails, and the evidence files you use for that contig. Thanks, Carson On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >On 05/10/2013 07:08 PM, Carson Holt wrote: >> 2.27 from the website download or the SVN devel version? > >SVN. I checked it out on 19/03/2013. > >> Thanks, >> Carson >> >> >> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>> passthrough in the run that generates the duplication? >>> >>> I am using version 2.27 of maker. I am not using the passthrough >>>option. >>> >>> Cheers, >>> Michael. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>> >>>>> Hello Carson! >>>>> >>>>> I have been trying to get to the bottom of an error message when >>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>> unclear >>>>> error messages about misordered and overlapping exons. >>>>> >>>>> I have looked into the gff files from which these exons originate and >>>>> noticed that a lot of exons in that file were duplicated. For >>>>>example I >>>>> have found these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> and then about four hundred lines later there are these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> which are identical except for the order number after "exon:". >>>>> >>>>> This seems to have happened to a lot of features in that file. >>>>> >>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>> maker recompute the gff file without redoing all the computations >>>>> again? >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From mnuhn at ebi.ac.uk Mon May 13 11:30:36 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Mon, 13 May 2013 17:30:36 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <5191152C.5030103@ebi.ac.uk> Hello Carson! On 05/13/2013 03:01 PM, Carson Holt wrote: > Could you send me your maker opts files, the contig that fails, and the > evidence files you use for that contig. Thanks for offering your help. I worked around the problem this morning by removing all exons from the training set for which I was getting the error. Now I'm rerunning maker and I can't find any gff files at the moment with this problem. If the problem reappears, I'll send you the files. Cheers, Michael. > Thanks, > Carson > > > > On 13-05-10 8:29 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 07:08 PM, Carson Holt wrote: >>> 2.27 from the website download or the SVN devel version? >> >> SVN. I checked it out on 19/03/2013. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>> >>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>> passthrough in the run that generates the duplication? >>>> >>>> I am using version 2.27 of maker. I am not using the passthrough >>>> option. >>>> >>>> Cheers, >>>> Michael. >>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>> >>>>>> Hello Carson! >>>>>> >>>>>> I have been trying to get to the bottom of an error message when >>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>> unclear >>>>>> error messages about misordered and overlapping exons. >>>>>> >>>>>> I have looked into the gff files from which these exons originate and >>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>> example I >>>>>> have found these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> and then about four hundred lines later there are these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> which are identical except for the order number after "exon:". >>>>>> >>>>>> This seems to have happened to a lot of features in that file. >>>>>> >>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>> maker recompute the gff file without redoing all the computations >>>>>> again? >>>>>> >>>>>> Cheers, >>>>>> Michael. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>> g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon May 13 11:07:13 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 12:07:13 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <5191152C.5030103@ebi.ac.uk> Message-ID: Ok. Thanks, Carson On 13-05-13 12:30 PM, "Michael Nuhn" wrote: >Hello Carson! > >On 05/13/2013 03:01 PM, Carson Holt wrote: >> Could you send me your maker opts files, the contig that fails, and the >> evidence files you use for that contig. > >Thanks for offering your help. > >I worked around the problem this morning by removing all exons from the >training set for which I was getting the error. Now I'm rerunning maker >and I can't find any gff files at the moment with this problem. > >If the problem reappears, I'll send you the files. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> >> On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 07:08 PM, Carson Holt wrote: >>>> 2.27 from the website download or the SVN devel version? >>> >>> SVN. I checked it out on 19/03/2013. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>>> >>>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>>> passthrough in the run that generates the duplication? >>>>> >>>>> I am using version 2.27 of maker. I am not using the passthrough >>>>> option. >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>>> >>>>>>> Hello Carson! >>>>>>> >>>>>>> I have been trying to get to the bottom of an error message when >>>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>>> unclear >>>>>>> error messages about misordered and overlapping exons. >>>>>>> >>>>>>> I have looked into the gff files from which these exons originate >>>>>>>and >>>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>>> example I >>>>>>> have found these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> and then about four hundred lines later there are these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> which are identical except for the order number after "exon:". >>>>>>> >>>>>>> This seems to have happened to a lot of features in that file. >>>>>>> >>>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>>> maker recompute the gff file without redoing all the computations >>>>>>> again? >>>>>>> >>>>>>> Cheers, >>>>>>> Michael. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>or >>>>>>> g >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From rob.syme at gmail.com Tue May 14 01:54:18 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 14 May 2013 14:54:18 +0800 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Message-ID: Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz, making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 14 06:20:00 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 May 2013 07:20:00 -0400 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy In-Reply-To: Message-ID: You have to use MPICH2, the new MPICH3 is not compatible. MPI version 3 is a completely new protocol implemented in MPICH3, and it breaks MAKER. You can also use OpenMPI with the MAKER version 2.27. Thanks, Carson From: Rob Syme Date: Tuesday, 14 May, 2013 2:54 AM To: Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz , making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Tue May 14 15:42:33 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 14 May 2013 20:42:33 +0000 Subject: [maker-devel] MPI MAKER hanging NFS Message-ID: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1 with RHEL 6.3. Questions: (1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues. (2) Has anyone else seen this? (3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected? I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks. Thanks, Todd Heywood CSHL From Carson.Holt at oicr.on.ca Tue May 14 20:01:00 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 01:01:00 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > From eernst at cshl.edu Wed May 15 12:08:08 2013 From: eernst at cshl.edu (Evan Ernst) Date: Wed, 15 May 2013 13:08:08 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt wrote: > No it does not use ROMIO. > > The locking may be do to how your NFS is implemented. MAKER does a lot of > small writes. Some NFS implementations do not handle that well and only > like large infrequent writes and frequent reads? > MAKER also uses a variant of the File:::NFSLock module which uses > hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > enabled (described here > http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > I know that the FhGFS implementation of NFS has broken hard link > functionality. > > > Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > mounted location. It must be local (/tmp for example). This is because > certain types of operations are not always NFS safe and need a local > location to work with (anything involving berkley DB or SQLite for > example). Make sure you are not setting that to an NFS mounted scratch > location. The mpi**** files, are examples of some short lived files that > should not be in NFS. They hold chunks of data from threads that are > processing the genome and are very rapidly created and deleted. They will > be cleaned up automatically when maker finished or killed by standard > signals such as when you hit ^C or use kill 15. > > > Thanks, > Carson > > > > > On 13-05-14 4:42 PM, "Heywood, Todd" wrote: > > >We have been getting hung NFS mounts on some nodes when running MPI MAKER > >(version 2.27). Processes go into a "D" state and cannot be killed. We > >end up having to reboot nodes to recover them. We are running MPICH2 > >version 1.4.1p1 > >with RHEL 6.3. Questions: > > > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >on a sync_page system call under NFS. That *might* imply some locking > >issues. > > > >(2) Has anyone else seen this? > > > >(3) The root directory (parent of genome.maker.output directory) has lots > >of mpi***** files, all of which have the first line > >"pst0Process::MpiChunk". Is this expected? > > > >I'm able to reproducibly hang NFS on some nodes when using at least 4 > >32-core nodes and 128 running MPI tasks. > > > >Thanks, > > > >Todd Heywood > >CSHL > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 15 12:15:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 17:15:52 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Thu May 16 11:08:43 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Thu, 16 May 2013 17:08:43 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195048B.9080707@ebi.ac.uk> Hi Carson, When I was trying to load the Maker-2.27 results into ensembl, I found that few hundreds of genes with 'duplicate exons' . When I looked in the gff file, I found cases like this, where the exons are not actually duplicated but have two Parents with same mRNA ID. This can be a potential alternate transcript, attached to the same transcript by mistake? Many thanks Uma 3 maker gene 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179 3 maker mRNA 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_AED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 3 maker exon 524271 524480 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524538 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 524903 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 525182 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524271 524480 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524904 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 From carsonhh at gmail.com Thu May 16 11:13:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:13:05 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I've had one other report of this on the devel list, but haven't gotten data to test with. Do you have the run files that produced the duplicate exon? If so, cCould you send me theVoid directory for the contig that shows the dulicate, and the maker_opts.ctl file? Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 16 11:25:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:25:36 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I think this also may be a result of using GFF3 pass-through. So if that is the case, could you send me any GFF3 files you gave maker in addition to the other files I asked for. Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dsth at ebi.ac.uk Thu May 16 11:38:35 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 16 May 2013 17:38:35 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: <5195048B.9080707@ebi.ac.uk> Message-ID: hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 16 11:50:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:50:50 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: Message-ID: Yes. Perhaps this is the same issue Michael saw, although the one difference I see from his post is the Parent= attribute. --> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-proce ssed-gene-6.179-mRNA-1 I have seen duplicate exons from GFF3 pass-through in the past, but if that's not being used I'd be very appreciative of any test dataset you could give me. Thanks, Carson From: Daniel Hughes Date: Thursday, 16 May, 2013 12:38 PM To: Carson Holt Cc: Uma Maheswari , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] duplicate exons? hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ---------------------------------------------------------------------------- --------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I found >> >that few hundreds of genes with 'duplicate exons' . When I looked in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Fri May 17 03:41:56 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Fri, 17 May 2013 09:41:56 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195ED54.4090501@ebi.ac.uk> Hi Carson, I checked with Michael, this is different from what he saw, he had entire segements of gff files duplicated, In this case, just Parent id is. I am preparing the files you asked for, will send them soon thanks Uma On 16/05/13 17:50, Carson Holt wrote: > Yes. Perhaps this is the same issue Michael saw, although the one > difference I see from his post is the Parent= attribute. > > --> > Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 > > I have seen duplicate exons from GFF3 pass-through in the past, but if > that's not being used I'd be very appreciative of any test dataset you > could give me. > > Thanks, > Carson > > > > > From: Daniel Hughes > > Date: Thursday, 16 May, 2013 12:38 PM > To: Carson Holt > > Cc: Uma Maheswari >, > "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] duplicate exons? > > hiya, are you using the same instance as michael at ebi as this sounds > like the same problem he had last week and he wasn't running pass > through. i've run 2.27 here 30+ times here and not seen this? is > something very strange corrupted? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/16 Carson Holt > > > I think this also may be a result of using GFF3 pass-through. So > if that > is the case, could you send me any GFF3 files you gave maker in > addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" > wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked > in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.abriata at epfl.ch Fri May 17 04:45:41 2013 From: luciano.abriata at epfl.ch (Luciano Abriata) Date: Fri, 17 May 2013 09:45:41 +0000 Subject: [maker-devel] getting protein sequences from genomes Message-ID: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: 1) How can I match proteins predicted for the same gene in two genomes? 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta Which of these files contains the definite set of predicted protein sequences? Thanks in advance! Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Fri May 17 08:25:16 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Fri, 17 May 2013 13:25:16 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> It appears that a kernel bug caused the NFS hang, at least for limlted scale testing (6 nodes, 192 tasks). I upgraded the kernel from 2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and cannot reproduce the hangs. As far a TMPDIR, I'm not really sure I understand. We use SGE, and the TMPDIR we are referring to is set by SGE within a job to be /tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? Todd From: Carson Holt > Date: Wednesday, May 15, 2013 1:15 PM To: "Ernst, Evan" > Cc: Todd Heywood >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Fri May 17 08:40:50 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Fri, 17 May 2013 13:40:50 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> Message-ID: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt > >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" > >Cc: Todd Heywood >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst > >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt > >Cc: "Heywood, Todd" >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From barry.moore at genetics.utah.edu Fri May 17 14:02:31 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 17 May 2013 13:02:31 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Message-ID: On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? blastp tweaked with parameters to optimize near perfect match > > 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. QI values are as follows: 5' UTR Length Fraction of splice sites confirmed by EST alignment. Fraction of exons that overlap and EST alignment. Fraction of exons that overlap EST or protein alignment. Fraction of splice sites confirmed by an ab initio prediction. Fraction of exons that overlap an ab intitio prediction. Number of exons in the transcript. 3' UTR length. Length of encoded protein. > 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta > > Which of these files contains the definite set of predicted protein sequences? The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Sun May 19 23:16:10 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Mon, 20 May 2013 12:16:10 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Message-ID: Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 15:36:38 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 16:36:38 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> References: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> Message-ID: Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt > > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" > > >Cc: Todd Heywood >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst > > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt > > >Cc: "Heywood, Todd" >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > >> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2013-05-20 at 4.14.09 PM.png Type: image/png Size: 22634 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 20 18:50:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 19:50:28 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , "Heywood, Todd" Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > >> >It appears that a kernel bug caused the NFS hang, at least for limlted >> >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >> >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >> >cannot reproduce the hangs. >> > >> >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >> >TMPDIR we are referring to is set by SGE within a job to be >> >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> > >> >Todd >> > >> > >> > >> > >> >From: Carson Holt > >> >Date: Wednesday, May 15, 2013 1:15 PM >> >To: "Ernst, Evan" > >> >Cc: Todd Heywood >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >The mpi**** files should be generated in the $TMPDIR or TMP= location. >> >If they are happening in the working directory, then there is a problem. >> >If you are not setting TMP=, perhaps TMPDIR is not being exported when >> >'mpiexec' is launched. You may have to manually specify that it needs to >> >be exported to the other nodes using the mpiexec command line flags. >> >OpenMPI for example does not export all environmental variables by >> >default to the other nodes. >> > >> >Thanks, >> >Carson >> > >> > >> > >> >From: Evan Ernst > >> >Date: Wednesday, 15 May, 2013 1:08 PM >> >To: Carson Holt > >> >Cc: "Heywood, Todd" >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >Hi Carson, >> > >> >For these runs, -TMP is set to the $TMPDIR environment variable via maker >> >command line argument in the cluster job script to use the local disk on >> >each node. We can see files being generated in those locations on each >> >node, so it seems this is working as expected. >> > >> >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >> >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >> >onto a different, faster nfs mount than the working dir where the mpi**** >> >files are being written. >> > >> >Thanks, >> >Evan >> > >> > >> > >> >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> >> wrote: >> >No it does not use ROMIO. >> > >> >The locking may be do to how your NFS is implemented. MAKER does a lot of >> >small writes. Some NFS implementations do not handle that well and only >> >like large infrequent writes and frequent reads? >> >MAKER also uses a variant of the File:::NFSLock module which uses >> >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >> >enabled (described here >> >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >> >I know that the FhGFS implementation of NFS has broken hard link >> >functionality. >> > >> > >> >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >> >mounted location. It must be local (/tmp for example). This is because >> >certain types of operations are not always NFS safe and need a local >> >location to work with (anything involving berkley DB or SQLite for >> >example). Make sure you are not setting that to an NFS mounted scratch >> >location. The mpi**** files, are examples of some short lived files that >> >should not be in NFS. They hold chunks of data from threads that are >> >processing the genome and are very rapidly created and deleted. They will >> >be cleaned up automatically when maker finished or killed by standard >> >signals such as when you hit ^C or use kill 15. >> > >> > >> >Thanks, >> >Carson >> > >> > >> > >> > >> >On 13-05-14 4:42 PM, "Heywood, Todd" >> >> wrote: >> > >>> >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>> >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>> >>end up having to reboot nodes to recover them. We are running MPICH2 >>> >>version 1.4.1p1 >>> >>with RHEL 6.3. Questions: >>> >> >>> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>> >>on a sync_page system call under NFS. That *might* imply some locking >>> >>issues. >>> >> >>> >>(2) Has anyone else seen this? >>> >> >>> >>(3) The root directory (parent of genome.maker.output directory) has lots >>> >>of mpi***** files, all of which have the first line >>> >>"pst0Process::MpiChunk". Is this expected? >>> >> >>> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>> >>32-core nodes and 128 running MPI tasks. >>> >> >>> >>Thanks, >>> >> >>> >>Todd Heywood >>> >>CSHL >>> >> >>> >> >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 19:20:22 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 20:20:22 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: Message-ID: /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt wrote: > Could you run the following command for me and share the ouptut with me? > > mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > > Thanks, > Carson > > > > From: Evan Ernst > > Date: Monday, 20 May, 2013 4:36 PM > To: Carson Holt > > Cc: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, > "Heywood, Todd" > > Subject: Re: [maker-devel] MPI MAKER hanging NFS > > Hi Carson, > > The SGE launch script looks like this (sans SGE args): > > mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl > maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > > Snooping on the running jobs (see attached image), it looks like $TMPDIR > is evaluated to a local directory by the shell of the MPI master node as > intended, so the evaluated path, not the env var reference, is being passed > to the MPI workers. > > Despite this, the mpi*** files are still being created in the working > directory. > > If I understand correctly, these mpi*** files are meant to be written to > the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), > which should be equivalent, but this doesn't seem to be the case. > > Thanks, > Evan > > > > > On Fri, May 17, 2013 at 9:40 AM, Carson Holt > wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" heywood at cshl.edu>> wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt >>> > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" eernst at cshl.edu>> > >Cc: Todd Heywood heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst eernst at cshl.edu>> > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt >>> > >Cc: "Heywood, Todd" heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > > Carson.Holt at oicr.on.ca>> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > maker-devel at box290.bluehost.com>> > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 20 19:38:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 20:38:41 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> Message-ID: It may have just been a random failure. Try launching it again. Basically one instance failed to launch hydra_pmi_proxy which wraps the command being called via mpiexec. So you get 7 lines of output instead of the 8 that should be there. --Carson On 13-05-20 8:33 PM, "Heywood, Todd" wrote: >All starter_with_limit.sh does is set a ulimit for the top process for >the job, then start it passing all parameters: > >#!/bin/sh >ulimit -c 0 >exec $* > > >From: Evan Ernst > >Date: Monday, May 20, 2013 8:20 PM >To: Carson Holt > >Cc: Carson Holt >, >"maker-devel at yandell-lab.org" >>, Todd >Heywood > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/opt/uge/default/common/starter_with_limit.sh: line 4: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": No such file or directory >/opt/uge/default/common/starter_with_limit.sh: line 4: exec: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": cannot execute: No such file or directory > > >Todd, are these errors from the starter_with_limit.sh wrapper harmless? > >Thanks, >Evan > > >On Mon, May 20, 2013 at 7:50 PM, Carson Holt >> wrote: >Could you run the following command for me and share the ouptut with me? > >mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > >Thanks, >Carson > > > >From: Evan Ernst >nst at cshl.edu>>> >Date: Monday, 20 May, 2013 4:36 PM >To: Carson Holt >oicr.on.ca>> >Cc: >"maker-devel at yandell-lab.orgker-devel at yandell-lab.org>" >ker-devel at yandell-lab.org>>, >"Heywood, Todd" >heywood at cshl.edu>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >The SGE launch script looks like this (sans SGE args): > >mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl >maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > >Snooping on the running jobs (see attached image), it looks like $TMPDIR >is evaluated to a local directory by the shell of the MPI master node as >intended, so the evaluated path, not the env var reference, is being >passed to the MPI workers. > >Despite this, the mpi*** files are still being created in the working >directory. > >If I understand correctly, these mpi*** files are meant to be written to >the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), >which should be equivalent, but this doesn't seem to be the case. > >Thanks, >Evan > > > > >On Fri, May 17, 2013 at 9:40 AM, Carson Holt >oicr.on.ca>> wrote: >I'm glad your getting better results. > >With respect to environmental variables. One common error in MPI >execution is that the environment variables will not always be the same on >the other nodes since only the root node is attached to a terminal, so >variables in launch scripts (.bashrc etc.) may not be available on all >nodes. Many clusters that are part of the XSEDE network and use SGE for >example have scripts that wrap mpiexec to guarantee export of all >environmental variables when using MPI to avoid just this type of common >error. So like anything, you start with the most common cause of errors >and then work to the less common. Kernel bugs usually rank low on the >list :-) But I'm glad it's working for you now. > >Thanks, >Carson > > > > > >On 13-05-17 9:25 AM, "Heywood, Todd" >heywood at cshl.edu>>> wrote: > >>It appears that a kernel bug caused the NFS hang, at least for limlted >>scale testing (6 nodes, 192 tasks). I upgraded the kernel from >>2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >>cannot reproduce the hangs. >> >>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >>TMPDIR we are referring to is set by SGE within a job to be >>/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> >>Todd >> >> >> >> >>From: Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> >>Date: Wednesday, May 15, 2013 1:15 PM >>To: "Ernst, Evan" >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Cc: Todd Heywood >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>The mpi**** files should be generated in the $TMPDIR or TMP= location. >>If they are happening in the working directory, then there is a problem. >>If you are not setting TMP=, perhaps TMPDIR is not being exported when >>'mpiexec' is launched. You may have to manually specify that it needs to >>be exported to the other nodes using the mpiexec command line flags. >>OpenMPI for example does not export all environmental variables by >>default to the other nodes. >> >>Thanks, >>Carson >> >> >> >>From: Evan Ernst >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Date: Wednesday, 15 May, 2013 1:08 PM >>To: Carson Holt >>>@oicr.on.ca>>>on.holt at oicr.on.ca>>>> >>Cc: "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>Hi Carson, >> >>For these runs, -TMP is set to the $TMPDIR environment variable via maker >>command line argument in the cluster job script to use the local disk on >>each node. We can see files being generated in those locations on each >>node, so it seems this is working as expected. >> >>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >>relevant, but I'm also setting mpi_blastdb= to consolidate the databases >>onto a different, faster nfs mount than the working dir where the mpi**** >>files are being written. >> >>Thanks, >>Evan >> >> >> >>On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> wrote: >>No it does not use ROMIO. >> >>The locking may be do to how your NFS is implemented. MAKER does a lot >>of >>small writes. Some NFS implementations do not handle that well and only >>like large infrequent writes and frequent reads? >>MAKER also uses a variant of the File:::NFSLock module which uses >>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >>enabled (described here >>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >>I know that the FhGFS implementation of NFS has broken hard link >>functionality. >> >> >>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >>mounted location. It must be local (/tmp for example). This is because >>certain types of operations are not always NFS safe and need a local >>location to work with (anything involving berkley DB or SQLite for >>example). Make sure you are not setting that to an NFS mounted scratch >>location. The mpi**** files, are examples of some short lived files that >>should not be in NFS. They hold chunks of data from threads that are >>processing the genome and are very rapidly created and deleted. They >>will >>be cleaned up automatically when maker finished or killed by standard >>signals such as when you hit ^C or use kill 15. >> >> >>Thanks, >>Carson >> >> >> >> >>On 13-05-14 4:42 PM, "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>> wrote: >> >>>We have been getting hung NFS mounts on some nodes when running MPI >>>MAKER >>>(version 2.27). Processes go into a "D" state and cannot be killed. We >>>end up having to reboot nodes to recover them. We are running MPICH2 >>>version 1.4.1p1 >>>with RHEL 6.3. Questions: >>> >>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>>on a sync_page system call under NFS. That *might* imply some locking >>>issues. >>> >>>(2) Has anyone else seen this? >>> >>>(3) The root directory (parent of genome.maker.output directory) has >>>lots >>>of mpi***** files, all of which have the first line >>>"pst0Process::MpiChunk". Is this expected? >>> >>>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>>32-core nodes and 128 running MPI tasks. >>> >>>Thanks, >>> >>>Todd Heywood >>>CSHL >>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com>ailto:maker-devel at box290.bluehost.com>com>>>uehost.com>>290.bluehost.com>>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ maker-devel mailing list >maker-devel at box290.bluehost.comilto:maker-devel at box290.bluehost.comm>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From heywood at cshl.edu Mon May 20 19:33:32 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:33:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters: #!/bin/sh ulimit -c 0 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From heywood at cshl.edu Mon May 20 19:34:48 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:34:48 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130FB@ex-hs-mbx06.cshl.edu> Actually, line 4 is the exec (one line is commented out): #!/bin/sh ulimit -c 0 #ulimit -n 262144 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Mon May 20 19:48:32 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 21 May 2013 00:48:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you use the attached file to replace maker/src/bin/maker and maker/bin/maker? You will have to rerun 'maker/src/Build install' or just edit the shebang line (#!) if perl is located anywhere other than /usr/bin/perl. I explicitly tell it to use the system TMPDIR rather than letting it get set implicitly. See if that stops the mpi***** files in the working directory. It's always possible that this is just a slight difference in behavior for the version of the File::Temp module that is packaged with your perl. --Carson From: Evan Ernst > Date: Monday, 20 May, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, "Heywood, Todd" > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker Type: application/octet-stream Size: 49266 bytes Desc: maker URL: From carsonhh at gmail.com Mon May 20 20:08:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 21:08:51 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: On default settings MAKER will only put ab initio predictions that have some sort of evidence support (EST or protein) in the final gene set. The rejected predictions are still in the GFF3 for reference purposes as match/match_part features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might be why it is not there. You can add all rejected models that don't overlap an accepted model by setting keep_preds=1 (this usually brings a lot more into the final gene set than you really want though (lots of false positives). But for some organisms like fungi, which have high gene densities, this approach is relatively safe. Alternatively the gene is missing because it overlaps another gene model that was accepted. MAKER won't allow overlapping models on the same strand in eukaryotes. The only way to force that kind of overlap is to give MAKER the reference models in model_gff and not let it call it's own models (then maker is really just aligning evidence and scoring the reference models). One final note. If there is no evidence supporting the model, and that is why it is rejected, you can also try adding more evidence to the maker run or you can consider the possibility that the gene model in the reference is not real to being with (i.e. a false positive gene model called during the initial annotation process and not supported by protein or expression data from any source). Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 12:16 AM To: Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Mon May 20 20:19:20 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Tue, 21 May 2013 09:19:20 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: References: Message-ID: Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have > some sort of evidence support (EST or protein) in the final gene set. The > rejected predictions are still in the GFF3 for reference purposes as > match/match_part features, but not as gene/mRNA/exon/CDS features. So a > lack of evidence might be why it is not there. You can add all rejected > models that don't overlap an accepted model by setting keep_preds=1 (this > usually brings a lot more into the final gene set than you really want > though (lots of false positives). But for some organisms like fungi, which > have high gene densities, this approach is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model > that was accepted. MAKER won't allow overlapping models on the same strand > in eukaryotes. The only way to force that kind of overlap is to give MAKER > the reference models in model_gff and not let it call it's own models (then > maker is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is > why it is rejected, you can also try adding more evidence to the maker run > or you can consider the possibility that the gene model in the reference is > not real to being with (i.e. a false positive gene model called during the > initial annotation process and not supported by protein or expression data > from any source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present > in the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure > were absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Tue May 21 00:57:19 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 21 May 2013 13:57:19 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column Message-ID: Hi all By my reading of the GFF3 spec ( http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 02:36:37 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Tue, 21 May 2013 07:36:37 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue May 21 18:54:40 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 21 May 2013 17:54:40 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Tue May 21 10:58:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Tue, 21 May 2013 10:58:43 -0500 Subject: [maker-devel] Maker: Re-annotation Message-ID: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:59:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 20:59:09 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. > Maker has been chosen as the preliminary tool for the task. By checking the > annotation results by using maker 2.10, we saw some loci have the fusion > problem: two separate neighbour genes are likely to be fused together and > regarded as a single candidate output by maker. If we go further by looking at > the outputs from each individual de novo algorithm, e.g. augustus or snap, the > prediction was correct. We are also using RNA-Seq assembly from cufflinks and > some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and > ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when > to merge models or not. If possible, can you please explain what these two > parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to > predict UTRs, enlarge the protein evidence datasets and input our previous > annotations as model_gff. We are facing with an critical question: in which > way we could effectively improve the gene fusing problem? 1) setting the > pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything > else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 20:23:48 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 01:23:48 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:37:02 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:37:02 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:39:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:39:01 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 21:23:26 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 02:23:26 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Thank you Carson. It has been a very helpful conversation with you! I will pass these information back to our group. Best regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 11:39 AM To: Li, Sean (CMIS, Acton); barry.moore at genetics.utah.edu Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: > Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt >, Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 21:28:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:28:46 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: The option in trinity is --jaccard_clip --> http://trinityrnaseq.sourceforge.net/#jaccard_clip --Carson From: Innocent Onsongo Date: Tuesday, 21 May, 2013 11:58 AM To: Subject: [maker-devel] Maker: Re-annotation Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 21:32:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:32:54 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: It looks like the phase was calculated from the wrong strand orientation. I believe I have corrected this now. I'm checking a few more things, but I'll have 2.28 as the latest release likely tomorrow with the cumulative bug fixes since the last release. Thanks, Carson From: Rob Syme Date: Tuesday, 21 May, 2013 1:57 AM To: Subject: [maker-devel] Maker-derived CDS GFF3 phase column Hi all By my reading of the GFF3 spec (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 22:37:30 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:37:30 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <2024BE21-4293-4E9D-BE13-92774C7BC96D@gmail.com> Sean, The Trinity option to manage fusion transcripts is --jaccard_clip and is described here: http://trinityrnaseq.sourceforge.net/#jaccard_clip Trinity has also added functionality to use a hybrid reference-guided/de-novo assembly approach which you might also consider: http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html B On May 21, 2013, at 7:37 PM, Carson Holt wrote: > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > > No. Trinity would probably be a better approach to avoid merging. > > > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). > > --Carson > > > > > > From: > Date: Tuesday, 21 May, 2013 9:23 PM > To: Carson Holt , Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. > > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > Regards, > Sean > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Wednesday, 22 May 2013 10:59 AM > To: Barry Moore; Li, Sean (CMIS, Acton) > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. > > I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). > > Thanks, > Carson > > > From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM > To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Hi Sean, > > I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. > > Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. > > B > > On May 21, 2013, at 1:36 AM, > wrote: > > > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 22:43:47 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:43:47 -0600 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Tue May 21 23:04:04 2013 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 22 May 2013 12:04:04 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: References: Message-ID: Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. > I believe I have corrected this now. I'm checking a few more things, but > I'll have 2.28 as the latest release likely tomorrow with the cumulative > bug fixes since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec ( > http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from > Maker that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon", and for "reverse strand features, phase is counted from the end > field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) > is 5953. > The base at the end field is the first base of the translated CDS, so > there should be no bases removed "to reach the first base of the next > codon". I suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon" is measured from the 'left-hand' end of this feature (the start > field) rather than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 07:50:26 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:50:26 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: One other thing, I ran MAKER with the RM_off flag (maker -f -RM_off -q) the input sequences had already been masked. On Wed, May 22, 2013 at 7:47 AM, Innocent Onsongo wrote: > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it > will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 07:47:30 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:47:30 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > Hi Getiria, > > Does the MAKER produced GFF3 file contain any annotations at all? Can you > send the first ~100 lines each of the MAKER produced GFF3 file and of the > GFF3 files that you passed via maker_opts.ctl? > > B > > On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from > Augustus. We had previously used Augustus for gene prediction but now want > to combine these annotations with some EST data. I updated > fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was > successful. However, it also gives a "Segmentation fault (core dumped)" > message. It does produce a GFF3 file but when I load the GFF3 file into IGV > and look it does not contain any of the exon definitions in Augustus.gff3. > Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AugustusFirst100.gff3 Type: application/octet-stream Size: 9702 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS01058First100.gff Type: application/octet-stream Size: 5664 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CpG_IslandsFirst100.gff3 Type: application/octet-stream Size: 1963 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EST2ScaffoldFirst100.gff3 Type: application/octet-stream Size: 9900 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PromotersFirst100.gff3 Type: application/octet-stream Size: 112 bytes Desc: not available URL: From Carson.Holt at oicr.on.ca Wed May 22 09:03:14 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 22 May 2013 14:03:14 +0000 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Are you using MAKER version 2.10? I ask because there is in issue with other_gff in that version that has since been fixed. So if you don't get other_gff to pass-through, you will need to upgrade to 2.28 (release date is later today coincidentally). For the Augustus GFF3 file, the format is a little weird which is causing the problem. They are mRNA features not attached to genes. Rather than build the expected 3 level gene/mRNA/exon structure for these, it is simpler just to convert it to the 2 level match/match_part structure. Just convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your done so that it will force rebuild of the GFF3 database when you run again. Thanks, Carson From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore > wrote: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 11:38:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:38:50 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: I've released 2.28 on the website. This is one of the bugs that was fixed. It happens under a very specific set of circumstances. You need to run maker with the -a command line flag to get it to recalculate upstream variables after upgrading. Alternatively you can also just give maker your old GFF3 file (make all other options blank exempt for the *_pass= options), and maker will just rebuild it. Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 9:19 PM To: Carson Holt Cc: Subject: Re: [maker-devel] Why are some complete gene predictions not present in the final results? Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have some > sort of evidence support (EST or protein) in the final gene set. The rejected > predictions are still in the GFF3 for reference purposes as match/match_part > features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might > be why it is not there. You can add all rejected models that don't overlap an > accepted model by setting keep_preds=1 (this usually brings a lot more into > the final gene set than you really want though (lots of false positives). But > for some organisms like fungi, which have high gene densities, this approach > is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model that > was accepted. MAKER won't allow overlapping models on the same strand in > eukaryotes. The only way to force that kind of overlap is to give MAKER the > reference models in model_gff and not let it call it's own models (then maker > is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is why > it is rejected, you can also try adding more evidence to the maker run or you > can consider the possibility that the gene model in the reference is not real > to being with (i.e. a false positive gene model called during the initial > annotation process and not supported by protein or expression data from any > source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present in > the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure were > absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 11:39:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:39:53 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: Ok. It's available for download. --Carson From: Rob Syme Date: Wednesday, 22 May, 2013 12:04 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Maker-derived CDS GFF3 phase column Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. I > believe I have corrected this now. I'm checking a few more things, but I'll > have 2.28 as the latest release likely tomorrow with the cumulative bug fixes > since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec > (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker > that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next codon", > and for "reverse strand features, phase is counted from the end field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is > 5953. > The base at the end field is the first base of the translated CDS, so there > should be no bases removed "to reach the first base of the next codon". I > suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed from > the beginning of this feature to reach the first base of the next codon" is > measured from the 'left-hand' end of this feature (the start field) rather > than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Thu May 23 11:40:23 2013 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 23 May 2013 10:40:23 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch>, <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> Message-ID: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Hi Liciano, If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. B On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > Thanks for your reply! > > One more question, can you think of any tips to get the best possible predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? > > Thanks again, > > Luciano > > De: Barry Moore [barry.moore at genetics.utah.edu] > Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. > Para: Luciano Abriata > Cc: maker-devel at yandell-lab.org > Asunto: Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > >> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >> >> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >> >> 1) How can I match proteins predicted for the same gene in two genomes? > > blastp tweaked with parameters to optimize near perfect match > >> >> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >> >> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >> > > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. > > QI values are as follows: > > 5' UTR Length > Fraction of splice sites confirmed by EST alignment. > Fraction of exons that overlap and EST alignment. > Fraction of exons that overlap EST or protein alignment. > Fraction of splice sites confirmed by an ab initio prediction. > Fraction of exons that overlap an ab intitio prediction. > Number of exons in the transcript. > 3' UTR length. > Length of encoded protein. > > >> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >> >> Which of these files contains the definite set of predicted protein sequences? > > The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > >> >> >> >> Thanks in advance! >> >> Luciano >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Thu May 23 11:48:05 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 23 May 2013 17:48:05 +0100 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and > Augustus predictions as well as the predictions. If so, you don't want to > do that. An overlapping protein evidence is sufficient to promote a > prediction to an annotation, so by providing the protein translation of the > prediction along with the prediction you will guarantee that every > prediction will become an annotation and that means you lose the benefit of > evidence supervised annotation that MAKER provides. Include the proteins > from the D mel reference and if you want to cast a broader net include > proteins from other dipterans or even Uniprot - just depend on how > aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > > Thanks for your reply! > > One more question, can you think of any tips to get the best possible > predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be > real and don't exist if I blast them, plus a few others which don't start > with Methionine... So far I am including transcripts and translations from > flybase, and snap and augustus with their available trainings for flies. Do > you see any possible source of error in that? > > Thanks again, > > Luciano > > ------------------------------ > *De:* Barry Moore [barry.moore at genetics.utah.edu] > *Enviado el:* viernes, 17 de mayo de 2013 09:02 p.m. > *Para:* Luciano Abriata > *Cc:* maker-devel at yandell-lab.org > *Asunto:* Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > > Hello, I am trying to use Maker to annotate genomes from different > individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the > coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? > > > blastp tweaked with parameters to optimize near perfect match > > > 2) What is the meaning of all the data in a line such as the following one > (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 > eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > > > AED = Annotation edit distance describes how closely the prediction > matches the evidence. This is a distance measure and thus 0 is a perfect > match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as > AED with a couple of exceptions. For a protein coding exon to be counted > as overlapping protein evidence the reading frame must be the same in the > coding exon and the protein evidence. Second, when mRNA Seq data is used > as evidence and both ends of an exon are supported with splice site > spanning reads, the middle of that exon is counted as supported as well > even if coverage drops off in the interior of the exon.. For the most part > AED and eAED will always be the same, but eAED tends to work better on many > fringe cases. > > QI values are as follows: > > > 1. 5' UTR Length > 2. Fraction of splice sites confirmed by EST alignment. > 3. Fraction of exons that overlap and EST alignment. > 4. Fraction of exons that overlap EST or protein alignment. > 5. Fraction of splice sites confirmed by an ab initio prediction. > 6. Fraction of exons that overlap an ab intitio prediction. > 7. Number of exons in the transcript. > 8. 3' UTR length. > 9. Length of encoded protein. > > > > 3) If I include snap and augustus to improve protein predictions, I get > several protein.fasta files: augustus_masked.proteins.fasta , > snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and > proteins.fasta > > Which of these files contains the definite set of predicted protein > sequences? > > > The proteins.fasta file is the final set of proteins for all genes that > MAKER created annotations for. > > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Thu May 23 15:17:00 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Thu, 23 May 2013 16:17:00 -0400 Subject: [maker-devel] Advice on params for ciliates Message-ID: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Fri May 24 08:10:15 2013 From: daniel.standage at gmail.com (Daniel Standage) Date: Fri, 24 May 2013 09:10:15 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments Message-ID: Greetings! I have some precomputed transcript and protein alignments that I would like to use with Maker. I have converted them into GFF3 format (see attached examples) and provided them to their corresponding entries (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. Unfortunately, Maker seems to be getting caught up on processing these GFF3 files. I've tried running Maker 2.10 as well as the development version (checked out a few months ago--svn server isn't responding so I can't give a precise revision number), and in both cases Maker hangs while trying to create the GFF3 database. These are the last lines I see in STDERR when * --debug* is set. STATUS: Setting up database for any GFF3 input... Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 587. I can't find any documentation specifying any explicit requirements for the alignment-containing GFF3 input files. Maker output uses the pretty canonical *expressed_sequence_match*, *protein_match*, and *match_part*features for encoding alignments, and I have used this convention with my input (see attached examples). I have also double-checked that my examples are valid GFF3, so my guess is that Maker has additional constraints/expectations for certain fields in the GFF3 files (score column? required attributes?). Is this correct, and if so would you be able to point me toward any related documentation I may have missed? Many thanks. -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prot-example.gff3 Type: application/octet-stream Size: 1079 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trans-example.gff3 Type: application/octet-stream Size: 1305 bytes Desc: not available URL: From guoyunfei1989 at gmail.com Fri May 24 11:15:19 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 24 May 2013 09:15:19 -0700 Subject: [maker-devel] ./FINISHED/FINISHED.gff Message-ID: Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGGTGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTAACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip...) maker 2.27 Thanks, Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 11:22:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 12:22:05 -0400 Subject: [maker-devel] ./FINISHED/FINISHED.gff In-Reply-To: Message-ID: Sometime the master_datastore_index.log gets munged by MPI (processes print at the same time). You can rebuild it by running a single instance of maker whith the -dsindex flag. It only takes about 60 seconds to rebuild. Example: cd maker -dsindex --Carson From: Yunfei Guo Date: Friday, 24 May, 2013 12:15 PM To: Subject: [maker-devel] ./FINISHED/FINISHED.gff Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGG TGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTA ACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip ...) maker 2.27 Thanks, Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 15:06:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 16:06:51 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments In-Reply-To: Message-ID: I'm glad it's working. I think I'll add a check for the '/' characters in the base name as I think having it be a directory will get me in trouble somewhere with hidden bugs. Thanks, Carson From: Daniel Standage Date: Friday, 24 May, 2013 4:00 PM To: Carson Holt Subject: Re: [maker-devel] Using maker with precomputed transcript / protein alignments Oh wow, you are going to LOVE this. I kept on messing around with things to see if I could tease out any patterns, and eventually it hit me. In my working directory, I have an outputs directory, which is intended to contain output directories from various different maker runs. However, since my submission scripts launch Maker from the working directory, I use -base outputs/blahblahblah as the base parameter. So when it tries to create output files using the base name (the SQLite3 db just happens to be the first), it tries to create outputs/blahblahblah/outputs/blahblahblah.db, and of course that internal outputs directory doesn't exist. Every time I've had problems, I've been using a basename with a / character (relative directory path). Every time I haven't had problems, it was because the / wasn't there. Since the base parameter determines the name of the output directory, I assumed I could also use it specify a nested output directory. So it looks like I just need to be more careful that the basenames I use don't contain / characters or any other special UNIX characters. Of course, this could be made explicit in the usage statement, or you could add something like this right after parsing the command line arguments. if($OPT{"out_name"} =~ m/\//) { printf(STDERR "base '%s' invalid: basenames containing relative directory paths cause errors; please provide a simple string instead", $OPT{"out_name"}); exit_maker(0); } Alternatively, you could handle things like I had originally expected: if I provide path/to/mybase as my base parameter, maker would create the path/to/mybase directory initially, but then in the creation of subsequent files it would simply use mybase. I don't imagine this would be too extensive of a change, but I understand Maker has a huge codebase. Anyway, just some suggestions, take them for what they're worth. Thanks for your help! -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University On Fri, May 24, 2013 at 3:29 PM, Carson Holt wrote: > NFS is weird. It's hard to say why it was freezing the first times, and did > not appear to freeze on your very last try. I definitely want to know if it > starts to freeze again, or if stack traces show a consistent point where it > freezes. If it keeps happening, I can try making the database in the local > /tmp and then just copying it to the current working directory once it's > populated to get around any weird NFS issues. But before going through all > the effort to do that, I'd like to know that it's not some other weird bug > related to the perl your using or other modules that are installed. Top > candidates on the list would be modules such as forks, forks::shared, DBI, or > DBD::SQLite. Try reinstalling those > > Thanks, > Carson > > > From: Daniel Standage > Date: Friday, 24 May, 2013 3:19 PM > > To: Carson Holt > Subject: Re: [maker-devel] Using maker with precomputed transcript / protein > alignments > > I admit I killed these last few runs too quickly, I guess I was getting > impatient, especially since waiting hours or days hasn't made a difference > before. Either way, that was sloppy on my part. > > However, I always specify the base parameter, whether or not I'm running > mulitple maker jobs from the same directory. And if I ever restarted a job, I > have always removed the original output directory entirely before > relaunching--precisely to avoid the types of mistakes you mention arising from > residual files. > > -- > Daniel S. Standage > Ph.D. Candidate > Bioinformatics and Computational Biology Program > Department of Genetics, Development, and Cell Biology > Iowa State University > > > On Fri, May 24, 2013 at 3:10 PM, Carson Holt wrote: >> Correct if you use the -base parameter you should get a different output >> directory. And if you have never used that base before, and it still >> freezes, then there is a problem. You do need to give it a little more time >> until killing it, as the stack trace in both cases showed that it was less >> than 25% finished reading the input GFF3 files and even less than that in the >> first case (so give it about 5x as long before giving up). >> >> It might just be that the NFS mount is slow. Or because of how weird the >> error is, other options include reinstalling perl and all modules. The >> weirdest bugs are often broken perl or inadvertently using modules from >> different perl versions via the PERL5LIB environmental variable (this is very >> common and can cause very wacky behavior). Another option is verifying all >> software for the lustre NFS mount is up to date. Lastly there was an odd NFS >> bug that came up on the e-mail list last week that was fixed by a kernel >> upgrade. >> >> --Carson >> >> >> >> From: Daniel Standage >> Date: Friday, 24 May, 2013 3:01 PM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Using maker with precomputed transcript / protein >> alignments >> >> The file locks are created only in the output directory, no? So there is a >> problem if I have multiple maker runs launched from the same directory, but >> writing to different output directories (as specified by different base >> parameters)? >> >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Bioinformatics and Computational Biology Program >> Department of Genetics, Development, and Cell Biology >> Iowa State University >> >> >> On Fri, May 24, 2013 at 2:57 PM, Carson Holt wrote: >>> To clarify, that means you need to use a different working directory. Can >>> be a subdirectory of your original. >>> >>> --Carson >>> >>> >>> From: Carson Holt >>> Date: Friday, 24 May, 2013 2:56 PM >>> To: Daniel Standage >>> >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Both stack traces show different locations in the code and file being read. >>> So it appears it was not frozen, just interrupted by control-C. >>> >>> If you restart make sure you do so in a completely new directory from the >>> original run. This is because I wonder if there is a failed job that still >>> has active processes and is holding onto file locks in that directory. >>> >>> --Carson >>> >>> >>> From: Daniel Standage >>> Date: Friday, 24 May, 2013 2:50 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Deleted output directory and re-ran. Stack trace looks pretty similar. >>> >>> >>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line >>> 607. >>> SIGINT received >>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>> line 97, <$IN >>>> > line 243676. >>> forks::signals::__ANON__('INT') called at /usr/lib64/perl5/DBI.pm >>> line 1590 >>> eval {...} called at /usr/lib64/perl5/DBI.pm line 1590 >>> DBD::_::db::do('DBI::db=HASH(0x4987228)', 'INSERT INTO est_gff >>> (seqid, source, parent, start, end, line)...') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 493 >>> GFFDB::_add_to_db('GFFDB=HASH(0x49727a0)', >>> 'DBI::db=HASH(0x49871e0)', 'est_gff', 'HASH(0x49877e0)') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 432 >>> GFFDB::_add_type('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>> 'est_gff') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>> GFFDB::add_est('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>> GFFDB::new('GFFDB', 'HASH(0x489c488)') called at >>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>> >>> >>> -- >>> Daniel S. Standage >>> Ph.D. Candidate >>> Bioinformatics and Computational Biology Program >>> Department of Genetics, Development, and Cell Biology >>> Iowa State University >>> >>> >>> On Fri, May 24, 2013 at 2:45 PM, Carson Holt wrote: >>>> Could you run again, and so I can see if the stack trace is the same each >>>> time. >>>> >>>> --Carson >>>> >>>> >>>> From: Daniel Standage >>>> Date: Friday, 24 May, 2013 2:39 PM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>> protein alignments >>>> >>>> Restarted in the original NSF-mounted directory, never saw the .db file, >>>> and got this as the stack trace upon termination. >>>> >>>> STATUS: Setting up database for any GFF3 input... >>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>> line 607. >>>> SIGINT received >>>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>>> line 97, <$IN> line 170294. >>>> forks::signals::__ANON__('INT') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> eval {...} called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> GFFDB::_parse_line('GFFDB=HASH(0x4e5c730)', 'SCALAR(0x4e714b8)', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 431 >>>> GFFDB::_add_type('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>>> GFFDB::add_est('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>>> GFFDB::new('GFFDB', 'HASH(0x4d86488)') called at >>>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>>> >>>> >>>> -- >>>> Daniel S. Standage >>>> Ph.D. Candidate >>>> Bioinformatics and Computational Biology Program >>>> Department of Genetics, Development, and Cell Biology >>>> Iowa State University >>>> >>>> >>>> On Fri, May 24, 2013 at 2:25 PM, Carson Holt wrote: >>>>> Start a new job in a new directory from the original job (NFS mount). Use >>>>> the new maker executable I sent. If it still freezes, hit control-C to >>>>> get a stack trace. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> From: Daniel Standage >>>>> Date: Friday, 24 May, 2013 2:21 PM >>>>> >>>>> To: Carson Holt >>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>> protein alignments >>>>> >>>>> The job from several hours ago is still running with no changes. >>>>> >>>>> I just relaunched the job with a locally mounted working directory: I >>>>> could see the .db file almost immediately, and it took less than 5 minutes >>>>> to successfully build the SQLite3 db and proceed to the next steps of the >>>>> pipeline. Any ideas? >>>>> >>>>> -- >>>>> Daniel S. Standage >>>>> Ph.D. Candidate >>>>> Bioinformatics and Computational Biology Program >>>>> Department of Genetics, Development, and Cell Biology >>>>> Iowa State University >>>>> >>>>> >>>>> On Fri, May 24, 2013 at 2:01 PM, Carson Holt wrote: >>>>>> The NFS mount appears to be configured correctly. >>>>>> >>>>>> Here is what the maker.output directory should look like while the >>>>>> database is being generated. >>>>>> >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:51 . >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:50 .. >>>>>> -rw------x 1 cholt staff 85 24 May 13:50 >>>>>> .NFSLock.gi_lock.NFSLock >>>>>> -rw------- 1 cholt staff 52 24 May 13:50 >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> -rw-r--r-- 1 cholt staff 1413 24 May 13:50 maker_bopts.log >>>>>> -rw-r--r-- 1 cholt staff 1666 24 May 13:50 maker_exe.log >>>>>> -rw-r--r-- 1 cholt staff 4610 24 May 13:50 maker_opts.log >>>>>> drwxr-xr-x 4 cholt staff 136 24 May 13:50 mpi_blastdb >>>>>> -rw-r--r-- 1 cholt staff 29326336 24 May 13:51 pdom-annot-v0.41-1.db >>>>>> -rw-r--r-- 1 cholt staff 6704 24 May 13:51 >>>>>> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> >>>>>> Could you watch while maker is running to see if this file is created --> >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> You must use ls with the -a flag to see it or it will be hidden. >>>>>> >>>>>> Just keep letting it run until that file shows up. Shortly after it sows >>>>>> up, this one should appear --> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> Also could you try running MAKER once with the working directory being >>>>>> locally mounted (/tmp for example). >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Daniel Standage >>>>>> Date: Friday, 24 May, 2013 1:36 PM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>> protein alignments >>>>>> >>>>>> Here is the output. >>>>>> >>>>>> [dstandag at mason annot-v0.41] ls -al >>>>>> outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> total 32 >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 13:34 . >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 .. >>>>>> -rw-r--r-- 1 dstandag biol 1413 May 24 12:39 maker_bopts.log >>>>>> -rw-r--r-- 1 dstandag biol 1355 May 24 12:39 maker_exe.log >>>>>> -rw-r--r-- 1 dstandag biol 4883 May 24 12:39 maker_opts.log >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 mpi_blastdb >>>>>> -rw------x 1 dstandag biol 70 May 24 13:34 .NFSLock.gi_lock.NFSLock >>>>>> [dstandag at mason annot-v0.41] df outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>>> dc-mds01.uits.indiana.edu:/dc >>>>>> 1144318908992 928977247792 203869022296 83% /N/dc >>>>>> [dstandag at mason annot-v0.41] mount >>>>>> login_x86_64 on / type tmpfs (rw) >>>>>> proc on /proc type proc (rw) >>>>>> sysfs on /sys type sysfs (rw) >>>>>> devpts on /dev/pts type devpts (rw,gid=5,mode=620) >>>>>> tmpfs on /dev/shm type tmpfs (rw) >>>>>> tmpfs on /var/tmp type tmpfs (rw,size=10m) >>>>>> /dev/sdb2 on /tmp type ext4 (rw,relatime,barrier=1,data=ordered) >>>>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) >>>>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) >>>>>> AFS on /afs type afs (rw) >>>>>> bl-nas1:/vol/hd00 on /N/hd00 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas1:/vol/hd01 on /N/hd01 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/hd02 on /N/hd02 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas2:/vol/hd03 on /N/hd03 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/hdln on /N/u type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/soft on /N/soft type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/logs on /N/logs type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> none on /dev/cpuset type cpuset (rw) >>>>>> dc-mds01.uits.indiana.edu:/dc on /N/dc type lustre (rw,localflock) >>>>>> 149.165.235.173:/mds-wan/client on /N/dcwan type lustre (rw,localflock) >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daniel S. Standage >>>>>> Ph.D. Candidate >>>>>> Bioinformatics and Computational Biology Program >>>>>> Department of Genetics, Development, and Cell Biology >>>>>> Iowa State University >>>>>> >>>>>> >>>>>> On Fri, May 24, 2013 at 1:29 PM, Carson Holt wrote: >>>>>>> They load fine for me. It is an SQLite database. I know that SQLlite >>>>>>> can freeze on NFS if it's not configured properly. >>>>>>> >>>>>>> Could you send me the output from these 3 commands. >>>>>>> >>>>>>> ls -al >>>>>>> df >>>>>>> mount >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 1:13 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I deleted the entire output directory before relaunching. No .db files >>>>>>> are even created, only the mpi_blastdb directory with the genomic >>>>>>> sequence data and corresponding index, before it hangs. >>>>>>> >>>>>>> The GFF3 files are attached. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 12:57 PM, Carson Holt >>>>>>> wrote: >>>>>>> Did you delete any *.db files in the maker.output directory first. If >>>>>>> not do that, and check on the rerun if that file is growing in size. It >>>>>>> is a database to hold the GFF3 file entries. It's final size should be >>>>>>> ~ 2x the size of the combined GFF3 files. If it is growing, then it is >>>>>>> not really frozen (you just need to give it more time). If it is not >>>>>>> growing, send me your GFF3 files and I can try and duplicate the error. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 12:50 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I installed BioPerl-1.6.901, rebuilt Maker, and re-launched the job. >>>>>>> After running for 10-15 minutes, it seems to be hanging in the same >>>>>>> place as before. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 11:38 AM, Carson Holt >>>>>>> wrote: >>>>>>> That is the CPAN version and the last stable release on bioperl.org >>>>>>> . Older version as well as the bio-perl live >>>>>>> version will cause MAKER to fail. The both have issues with the Fasta >>>>>>> indexing module that maker uses. >>>>>>> >>>>>>> http://search.cpan.org/CPAN/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.901.tar >>>>>>> .gz >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 11:34 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I'm not sure if a rebuild of Maker was necessary, but I tried running it >>>>>>> just to be safe. It's complaining about Bio::Root::Version dependency >>>>>>> not being met. Looking at the Build.PL file, it requires >>>>>>> Bio::Root::Version version 1.006901. Is there really such a version, or >>>>>>> should this be changed to 1.006 or 1.006001? >>>>>>> >>>>>>> For now I'll change it to 1.006001 (the installed version) and proceed >>>>>>> with another test. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 9:45 AM, Carson Holt wrote: >>>>>>> Could you run this command in the maker devel base directory. >>>>>>> >>>>>>> svn switch --relocate svn://* >>>>>>> ************ >>>>>>> svn://* *************** >>>>>>> >>>>>>> Then do 'svn update', and then tell me what happens. Make sure to >>>>>>> delete the and *.db files in the *.maker.output/ directory before >>>>>>> retrying. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 9:10 AM >>>>>>> To: Maker Mailing List >>>>>>> Subject: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> Greetings! >>>>>>> >>>>>>> I have some precomputed transcript and protein alignments that I would >>>>>>> like to use with Maker. I have converted them into GFF3 format (see >>>>>>> attached examples) and provided them to their corresponding entries >>>>>>> (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. >>>>>>> >>>>>>> Unfortunately, Maker seems to be getting caught up on processing these >>>>>>> GFF3 files. I've tried running Maker 2.10 as well as the development >>>>>>> version (checked out a few months ago--svn server isn't responding so I >>>>>>> can't give a precise revision number), and in both cases Maker hangs >>>>>>> while trying to create the GFF3 database. These are the last lines I see >>>>>>> in STDERR when --debug is set. >>>>>>> >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>>>>> line 587. >>>>>>> >>>>>>> I can't find any documentation specifying any explicit requirements for >>>>>>> the alignment-containing GFF3 input files. Maker output uses the pretty >>>>>>> canonical expressed_sequence_match, protein_match, and match_part >>>>>>> features for encoding alignments, and I have used this convention with >>>>>>> my input (see attached examples). I have also double-checked that my >>>>>>> examples are valid GFF3, so my guess is that Maker has additional >>>>>>> constraints/expectations for certain fields in the GFF3 files (score >>>>>>> column? required attributes?). Is this correct, and if so would you be >>>>>>> able to point me toward any related documentation I may have missed? >>>>>>> >>>>>>> Many thanks. >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Sun May 26 21:26:58 2013 From: rob.syme at gmail.com (Rob Syme) Date: Mon, 27 May 2013 10:26:58 +0800 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Message-ID: Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_datastore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Tue May 28 07:00:54 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Tue, 28 May 2013 13:00:54 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195ED54.4090501@ebi.ac.uk> References: <5195ED54.4090501@ebi.ac.uk> Message-ID: <51A49C76.3060801@ebi.ac.uk> Thanks Carson, 2.28 with -a command line flag fixed this problem. Uma On 17/05/13 09:41, Uma Maheswari wrote: > Hi Carson, > > I checked with Michael, this is different from what he saw, he had > entire segements of gff files duplicated, In this case, just Parent id > is. > I am preparing the files you asked for, will send them soon > > thanks > Uma > > > On 16/05/13 17:50, Carson Holt wrote: >> Yes. Perhaps this is the same issue Michael saw, although the one >> difference I see from his post is the Parent= attribute. >> >> --> >> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 >> >> I have seen duplicate exons from GFF3 pass-through in the past, but >> if that's not being used I'd be very appreciative of any test dataset >> you could give me. >> >> Thanks, >> Carson >> >> >> >> >> From: Daniel Hughes > >> Date: Thursday, 16 May, 2013 12:38 PM >> To: Carson Holt > >> Cc: Uma Maheswari >, >> "maker-devel at yandell-lab.org " >> > >> Subject: Re: [maker-devel] duplicate exons? >> >> hiya, are you using the same instance as michael at ebi as this >> sounds like the same problem he had last week and he wasn't running >> pass through. i've run 2.27 here 30+ times here and not seen this? is >> something very strange corrupted? >> >> dan. >> >> Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) >> ------------------------------------------------------------------------------------- >> dsth at cantab.net >> dsth at cpan.org >> >> >> 2013/5/16 Carson Holt > >> >> I think this also may be a result of using GFF3 pass-through. So >> if that >> is the case, could you send me any GFF3 files you gave maker in >> addition >> to the other files I asked for. >> >> Thanks, >> Carson >> >> >> >> On 13-05-16 12:08 PM, "Uma Maheswari" > > wrote: >> >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I >> found >> >that few hundreds of genes with 'duplicate exons' . When I looked >> in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue May 28 20:37:58 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 29 May 2013 01:37:58 +0000 Subject: [maker-devel] maker running error Message-ID: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue May 28 20:58:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Wed, 29 May 2013 01:58:51 +0000 Subject: [maker-devel] maker running error In-Reply-To: References: Message-ID: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 07:45:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:45:30 -0400 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? In-Reply-To: Message-ID: It's not an MPI requirement, just an execution error. I've attached a fixed version of that script. Really it is just a wrapper that runs maker with a few parameters changes. You can do the exact same thing by removing all repeat mask options, setting est2genome=1 and then an adding est_forward=1 to the maker_opts.ctl file. Thanks, Carson From: Rob Syme Date: Sunday, 26 May, 2013 10:26 PM To: Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_dat astore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map2assembly Type: application/octet-stream Size: 6412 bytes Desc: not available URL: From carsonhh at gmail.com Wed May 29 07:49:39 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:49:39 -0400 Subject: [maker-devel] maker running error In-Reply-To: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Message-ID: Yes, most likely an input fasta error. If that is not the case there are also some versions of BLAST that have version specific failures, and are fixed by upgrading blast. For example, I see you are using blastall which is from the older NCBI BLAST as apposed to the newer BLAST+. --Carson From: Mark Yandell Date: Tuesday, 28 May, 2013 9:58 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker running error Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: > Dear all, > > When I try to run maker on my datasets, there is an error like this: > > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > > #--------- command -------------# > Widget::blastn: > /usr/local/bin/blastall -p blastn -d > /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r > 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blas > tn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn > #-------------------------------# > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > ERROR: BLASTN failed > > FATAL ERROR > ERROR: Failed while doing blastn of ESTs!! > > ERROR: Chunk failed at level 8 > !! > FAILED CONTIG:C4345703 > > > Could anyone give me some suggestions? > > Thanks! > > Jingjing > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 07:54:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:54:30 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Yes. That's ok, but you would get better performance by installing MPI and using that. Alternatively just start maker several times in the same directory without splitting the input fasta. You can usually start about 10-15 concurrent maker processes safely, but would still get better performance with MPI. --Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Tuesday, 28 May, 2013 8:16 AM To: , Carson Holt Cc: Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > > > Thanks, > > Carson > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > > > Thank you! > > > > Best, > > > > Diana > > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> Ok. You just ran the evidence and didn't give a gene predictor. You need >> to provide an HMM file for SNAP a species for augustus, or for rough >> annotations you can set protein3genome=1 and est2genome=1. This will try and >> generate models direct from the alignments. >> >> >> >> If you provide a gene predictor, then MAKER can talk to it about the >> evidence alignments so it can make a best gene call for the region. Then >> there will be gene/mRNA/exon model in the GFF3 file and entires in the >> proteins.fasta and transcripts.fasta. If you need to train a predictor, you >> can train SNAP using the maker2zff script and the SNAP documentation or maker >> GMOD tutorial. If you want to train augustus Jason Stajich wrote an >> excellent explanation as well as tools in a previous list message. >> >> >> >> >> list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html >> >> Script is in this github repo - >> >> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2a >> ugustus_gbk.pl >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 1:41 PM >> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> >> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, >> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de> >> Subject: Re: [maker-devel] Maker consensus >> >> >> >> >> >> Hi Carson, >> >> >> >> Thank you for the quick answer. >> >> I ran gff3_merge to merge all the gff files and this resulted in a gff file, >> which has these type of fields: >> >> scaffold32239 blastx protein_match 22905 34500 174 + . >> ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML >> 1-2039; >> scaffold32239 blastx match_part 22905 23045 174 + . >> ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG0000 >> 0000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT0000 >> 0000219|DSCAML1-2039 172 218;Gap=M47; >> >> In comparison to the dpp_contig test file, I am missing est2genome evidence, >> most probably because my est data set is pretty poor. I have blastx and >> protein2genome evidence though. >> >> >> >> My goal is to extract the genes that could be annotated on the scaffolds. In >> the gff files the hits overlap most of the times, I can visualize this >> properly in apollo: for example one scaffold hits DSCAML gene in both >> zebrafinch and chicken, but extracting the coordinates between which this >> scaffold fits this annotated gene is difficult from the gff. Manually >> curating the genes is also not an option, since I am trying to do this for a >> 1.7Gb genome. >> >> >> >> I hope this explains better what we are after. >> >> >> >> Thank you once again. >> >> >> >> Best regards, >> >> >> >> Diana >> On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: >> >> >>> >>> I'm sorry I don?t' understand question 1. You are you missing resulting >>> fasta files, correct? Did your resulting GFF3 file have any features of >>> type "gene"? Did you run fasta_merge after running gff3_merge? >>> >>> >>> >>> Could you give me more details on what you are trying to do, so I can take >>> a stab at question 2 as well. >>> >>> >>> >>> Thanks, >>> >>> Carson >>> >>> >>> >>> >>> >>> >>> >>> From: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Date: Friday, 10 May, 2013 10:44 AM >>> To: < maker-devel at yandell-lab.org> >>> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >>> kelso at eva.mpg.de>, Torsten Schoeneberg < >>> torsten.schoeneberg at medizin.uni-leipzig.de> >>> Subject: [maker-devel] Maker consensus >>> >>> >>> >>> >>> >>> >>> >>> Dear maker developers, >>> >>> >>> I am a phD student working on de novo assembly and annotation of a bird >>> genome. I used Maker as annotation pipeline, which ran very well, and I >>> obtained different annotations with evidence from Augustus gene predictor, >>> small EST dataset from my organism and protein sequences from chicken, >>> turkey and zebrafinch. I could combine the different gff files from >>> different scaffolds into one gff file with annotations for the entire >>> genome. >>> >>> >>> I now have two questions: >>> >>> >>> 1. What could be the reason that I haven't gotten the protein.fasta and >>> trancript.fasta files >>> >>> >>> 2. How can I obtain a consensus gene list of different evidences from maker? >>> What I would actually need is the scaffold, coordinates and annotation (gene >>> name) according to the 3 other bird species. >>> Thank you in advance. >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Diana Le Duc >>> >>> >>> >>> -- >>> >>> Max Planck Institute for Evolutionary Anthropology >>> Department of Evolutionary Genetics >>> Deutscher Platz 6 >>> D-04103 Leipzig >>> >>> Phone +49 (0)341-3550-554 >>> www.eva.mpg.de >>> >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Tue May 28 07:16:40 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Tue, 28 May 2013 14:16:40 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > Thanks, > Carson > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > Thank you! > > Best, > > Diana > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > Ok. You just ran the evidence and didn't give a gene predictor. You > > > need to provide an HMM file for SNAP a species for augustus, or for > > > rough annotations you can set protein3genome=1 and est2genome=1. This > > > will try and generate models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the > > evidence alignments so it can make a best gene call for the region. Then > > there will be gene/mRNA/exon model in the GFF3 file and entires in the > > proteins.fasta and transcripts.fasta. If you need to train a predictor, you > > can train SNAP using the maker2zff script and the SNAP documentation or > > maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an > > excellent explanation as well as tools in a previous list message. > > > > list msg - > > http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > > > Script is in this github repo - > > > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 1:41 PM > > To: < maker-devel at yandell-lab.org >, > > Carson Holt < carsonhh at gmail.com > > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > > >, Gabriel Renaud < > > gabriel_renaud at eva.mpg.de >, Janet Kelso > > < kelso at eva.mpg.de > > > Subject: Re: [maker-devel] Maker consensus > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff > > file, which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > > scaffold32239 blastx match_part 22905 23045 174 + . > > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > > 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome > > evidence, most probably because my est data set is pretty poor. I have > > blastx and protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. > > In the gff files the hits overlap most of the times, I can visualize this > > properly in apollo: for example one scaffold hits DSCAML gene in both > > zebrafinch and chicken, but extracting the coordinates between which this > > scaffold fits this annotated gene is difficult from the gff. Manually > > curating the genes is also not an option, since I am trying to do this for a > > 1.7Gb genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > > wrote: > > > > > > > I'm sorry I don?t' understand question 1. You are you missing > > > > > resulting fasta files, correct? Did your resulting GFF3 file have > > > > > any features of type "gene"? Did you run fasta_merge after running > > > > > gff3_merge? > > > > > > Could you give me more details on what you are trying to do, so I can > > > take a stab at question 2 as well. > > > > > > Thanks, > > > Carson > > > > > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Date: Friday, 10 May, 2013 10:44 AM > > > To: < maker-devel at yandell-lab.org > > > > > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > > >, Janet Kelso < kelso at eva.mpg.de > > > >, Torsten Schoeneberg < > > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > > > Subject: [maker-devel] Maker consensus > > > > > > > > > Dear maker developers, > > > > > > I am a phD student working on de novo assembly and annotation of a bird > > > genome. I used Maker as annotation pipeline, which ran very well, and I > > > obtained different annotations with evidence from Augustus gene predictor, > > > small EST dataset from my organism and protein sequences from chicken, > > > turkey and zebrafinch. I could combine the different gff files from > > > different scaffolds into one gff file with annotations for the entire > > > genome. > > > > > > I now have two questions: > > > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > > trancript.fasta files > > > > > > 2. How can I obtain a consensus gene list of different evidences from > > > maker? What I would actually need is the scaffold, coordinates and > > > annotation (gene name) according to the 3 other bird species. > > > > > > Thank you in advance. > > > > > > Best regards, > > > > > > Diana Le Duc > > > > > > -- > > > > > > Max Planck Institute for Evolutionary Anthropology > > > Department of Evolutionary Genetics > > > Deutscher Platz 6 > > > D-04103 Leipzig > > > > > > Phone +49 (0)341-3550-554 > > > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing > > > list maker-devel at box290.bluehost.com > > > > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaganjot.kaur at sickkids.ca Wed May 29 14:34:19 2013 From: gaganjot.kaur at sickkids.ca (Gaganjot Kaur) Date: Wed, 29 May 2013 19:34:19 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Message-ID: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 29 22:10:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 30 May 2013 03:10:52 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs In-Reply-To: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Message-ID: This is a parsing error coming from BioPerl. Could you run maker with the --debug flag. Redirect the STDERR to a file. You can kill it after a few seconds, I really just want to see the version for your BioPerl installation. Also what version of BLAST are you running. --Carson From: Gaganjot Kaur > Date: Wednesday, 29 May, 2013 3:34 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Wed May 1 05:38:52 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Wed, 01 May 2013 12:38:52 +0100 Subject: [maker-devel] substr outside of string Message-ID: <5180FECC.2020308@ebi.ac.uk> Hello! I have run maker with est and rna seq data to create a training set for SNAP. Then I trained SNAP and added the hmm to the snaphmm option and reran maker. Maker is giving me error messages like this: " setting up GFF3 output and fasta chunks doing repeat masking re reading repeat masker report. substr outside of string at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 . --> rank=NA, hostname=ebi-209.ebi.ac.uk " The line from which this error message originates is: substr($$seq, $b -1 , $l, "$replace"x$l); After getting these error messages I replaced it with eval { substr($$seq, $b -1 , $l, "$replace"x$l); }; if ($@) { use Carp; use Data::Dumper; confess( $@ . "\n\n" . Dumper($p) . "\n\n" . "Length of sequence: " . (length $$seq) ); } After that I got this: $VAR1 = [ 98926, 99033 ]; Length of sequence: 98686 at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 5 I have not changed the genome file. I'm also concerned with the reported length of 98686, because I have a list of all sequences in the file and their lengths, and none of them has a length of 98686 bp. The sequences with the closest lengths are these: 98367 LSalAtl2s1200 98438 LSalAtl2s1473 98776 LSalAtl2s1613 98876 LSalAtl2s1199 so they are not even close. $$seq is a sequence as a string, when I print it. Sometimes maker prints a message like this: " --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s63 Length: 3997709 Tries: 5!! #--------------------------------------------------------------------- " But according to my list, which I generated from the exact same file that maker has in genome_file option, the length of that sequence is 1169407. Any idea, why I am getting these problems and what to do about them? Cheers, Michael. From carsonhh at gmail.com Wed May 1 07:17:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 01 May 2013 09:17:50 -0400 Subject: [maker-devel] substr outside of string In-Reply-To: <5180FECC.2020308@ebi.ac.uk> Message-ID: The length you are printing is not the length of the contig, but rather the length of the piece of the contig MAKER is working with at that moment. The fact that the length is not exactly 100000 is telling me that this is a piece at the end of the contig. By any chance are you using GFF3 pass-through of repeat elements? If not there may be a repeatmasker parsing bug as the start and end coordinate are off the edge of the contig. If you run maker on the command line (not vie MPI), what is the repeatmasker report read immediately before the error. Could you then attach it and the fasta sequence for the contig that fails. Thanks, Carson On 13-05-01 7:38 AM, "Michael Nuhn" wrote: >Hello! > >I have run maker with est and rna seq data to create a training set for >SNAP. Then I trained SNAP and added the hmm to the snaphmm option and >reran maker. > >Maker is giving me error messages like this: > >" >setting up GFF3 output and fasta chunks >doing repeat masking >re reading repeat masker report. > >substr outside of string at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 >. >--> rank=NA, hostname=ebi-209.ebi.ac.uk >" > >The line from which this error message originates is: > > substr($$seq, $b -1 , $l, "$replace"x$l); > >After getting these error messages I replaced it with > > eval { > substr($$seq, $b -1 , $l, "$replace"x$l); > }; > if ($@) { > use Carp; > use Data::Dumper; > confess( > $@ > . "\n\n" > . Dumper($p) > . "\n\n" > . "Length of sequence: " . (length $$seq) > ); > } > >After that I got this: > >$VAR1 = [ > 98926, > 99033 > ]; > > >Length of sequence: 98686 at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 >5 > >I have not changed the genome file. > >I'm also concerned with the reported length of 98686, because I have a >list of all sequences in the file and their lengths, and none of them >has a length of 98686 bp. The sequences with the closest lengths are >these: > >98367 LSalAtl2s1200 >98438 LSalAtl2s1473 >98776 LSalAtl2s1613 >98876 LSalAtl2s1199 > >so they are not even close. > >$$seq is a sequence as a string, when I print it. > >Sometimes maker prints a message like this: > >" >--Next Contig-- > >Processing run.log file... >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s63 >Length: 3997709 >Tries: 5!! >#--------------------------------------------------------------------- >" > >But according to my list, which I generated from the exact same file >that maker has in genome_file option, the length of that sequence is >1169407. > >Any idea, why I am getting these problems and what to do about them? > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ejr at stowers.org Wed May 1 09:57:11 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 15:57:11 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Wed May 1 17:42:47 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 1 May 2013 17:42:47 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > Should this be accessible anonymously? > > I'm unable to connect. > > Eric > > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM > To: Barry Moore > Cc: Eric Ross , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >> Hi Eric, >> >> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >> >> http://www.sequenceontology.org/resources/sobacl.html >> >> Ultimately you'll get it like this: >> >> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >> >> Then run: >> >> SOBA/bin/SOBAcl --help >> >> For a lot of command line examples have a look in: >> >> SOBA/t/sobacl_test.sh >> >> B >> >> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >> >>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>> gff files? >>> >>> SOBA can give some basic stats, but it doesn't play well with my giant >>> files and I haven't figured out a way to run it locally. >>> >>> For that matter does anyone have a script that will calculate SOBA like >>> stats locally? I'd rather avoid writing one myself if something else is >>> out there. >>> >>> Thanks, >>> >>> Eric >>> >>> -- >>> Eric Ross >>> Bioinformatic Specialist I >>> Alejandro S?nchez Alvarado Laboratory >>> Stowers Institute for Medical Research >>> Howard Hughes Medical Institute >>> ejr at stowers.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Wed May 1 17:53:08 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 23:53:08 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Works now. Thanks much, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM To: Eric Ross > Cc: Jason Stajich >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunfei1989 at gmail.com Fri May 3 10:33:42 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 3 May 2013 09:33:42 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped Message-ID: Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri May 3 16:51:27 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 3 May 2013 16:51:27 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: <37BA6893-1175-4F3E-B3AA-6C1E23C4364E@genetics.utah.edu> Let me know how it works out for you - feedback either positive or negative is useful. B On May 1, 2013, at 5:53 PM, Ross, Eric wrote: > Works now. > > Thanks much, > > Eric > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM > To: Eric Ross > Cc: Jason Stajich , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Eric, > > Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. > > B > > On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > >> Should this be accessible anonymously? >> >> I'm unable to connect. >> >> Eric >> >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> From: Jason Stajich >> Date: Monday, April 29, 2013 5:49 PM >> To: Barry Moore >> Cc: Eric Ross , "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] repeat statistics >> >> Barry - I think you mean topaz instead of malachite? >> >> svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA >> >> >> Jason Stajich >> jason at bioperl.org >> jason.stajich at gmail.com >> http://bioperl.org/wiki/User:Jason >> http://twitter.com/hyphaltip >> >> >> On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >>> Hi Eric, >>> >>> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >>> >>> http://www.sequenceontology.org/resources/sobacl.html >>> >>> Ultimately you'll get it like this: >>> >>> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >>> >>> Then run: >>> >>> SOBA/bin/SOBAcl --help >>> >>> For a lot of command line examples have a look in: >>> >>> SOBA/t/sobacl_test.sh >>> >>> B >>> >>> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >>> >>>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>>> gff files? >>>> >>>> SOBA can give some basic stats, but it doesn't play well with my giant >>>> files and I haven't figured out a way to run it locally. >>>> >>>> For that matter does anyone have a script that will calculate SOBA like >>>> stats locally? I'd rather avoid writing one myself if something else is >>>> out there. >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> -- >>>> Eric Ross >>>> Bioinformatic Specialist I >>>> Alejandro S?nchez Alvarado Laboratory >>>> Stowers Institute for Medical Research >>>> Howard Hughes Medical Institute >>>> ejr at stowers.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmdoyle at purdue.edu Sun May 5 05:55:47 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Sun, 5 May 2013 07:55:47 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1109250054.216072.1367754420354.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I've recently attempted to install MAKER (Mac OS X). ?I installed blast and exonerate using the ./Build blast and ./Build exonerate commands, and I manually installed repeatmasker, snap and augustus (I couldn't get the ./Build commands to work). ?I then attempted to test out maker following the 2012 MAKER tutorial. ?I received the blastx error message pasted below, and there is additional information in the maker log I've attached to this email. ?I was wondering if anyone had any suggestions about debugging, as I'm not quite sure where to begin... Best wishes and thanks, Jackie #--------- command -------------# Widget::formater: /usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in /tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 #-------------------------------# dyld: lazy symbol binding failed: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace dyld: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in Widget::formater FATAL ERROR ERROR: Failed while doing blastx repeats!! ERROR: Chunk failed at level 3 !! FAILED CONTIG:contig-dpp-500-500 Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: Build status.odt Type: application/vnd.oasis.opendocument.text Size: 2740 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.odt Type: application/vnd.oasis.opendocument.text Size: 2772 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.odt Type: application/vnd.oasis.opendocument.text Size: 3479 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.odt Type: application/vnd.oasis.opendocument.text Size: 2821 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker log.odt Type: application/vnd.oasis.opendocument.text Size: 3340 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 6 06:32:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 08:32:52 -0400 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: Message-ID: You would have to send me the captured STDERR. MAKER will print out a number of messages whenever it restarts a contig, and will explain why it deletes any files before restarting. Thanks, Carson From: Yunfei Guo Date: Friday, 3 May, 2013 12:33 PM To: Subject: [maker-devel] maker doesn't pick up where it stopped Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 6 08:02:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 10:02:52 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Most maker development and debugging actually happens on a Mac (OS X 10.7.5). Blast, Augustus, SNAP all install for me just fine with maker 2.27. What errors do you get during installation? Do you by any chance have non-standard libraries via Mac ports for example. Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). I then attempted to test out maker following >the 2012 MAKER tutorial. I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From guoyunfei1989 at gmail.com Mon May 6 09:33:57 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Mon, 6 May 2013 08:33:57 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: References: Message-ID: Hi Carson, I used quitet mode, here's stderr (I only show one 'now starting the contig' msg). When I check maker master log upon restart by 'grep -ic finished master_log', all 'finished' tags were gone. A data structure will be created for you at: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _datastore To access files for individual sequences use the datastore index: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _master_datastore_index.log #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold105 Length: 8761 #--------------------------------------------------------------------- ... MAKER WARNING: The file GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/C8/27/scaffold5690//theVoid.scaffold5690/scaffold5690.0.HumanUCSCProteins%2Efasta.blastx did not finish on the last run and must be erased ... ERROR: Could not open '/home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/A4/F7/scaffold6034//theVoid.scaffold6034/scaffold6034.0.Srub%2Elib.specific.out' ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold6034 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold6034 ... On Mon, May 6, 2013 at 5:32 AM, Carson Holt wrote: > You would have to send me the captured STDERR. MAKER will print out a > number of messages whenever it restarts a contig, and will explain why it > deletes any files before restarting. > > Thanks, > Carson > > > From: Yunfei Guo > Date: Friday, 3 May, 2013 12:33 PM > To: > Subject: [maker-devel] maker doesn't pick up where it stopped > > Dear MAKER community, > > I got a problem that maker doesn't pick up where it stopped last time, > rather, it will discard all previous results. > > command: > echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 > maker version: > 2.26 > mpich version: > 1.5rc3 > in maker_opts: > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > > It never happened before. Any advice? > > Thank you! > > Yunfei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon May 6 20:22:23 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 7 May 2013 02:22:23 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> Message-ID: Repeats can still happen in genes. So an outright block actually causes more errors than it avoids, and a mixed approach of hard and soft masking becomes more appropriate. The masking step stops alignments from seeding in repeat regions, but if alignments seed in non-repeat regions then they can still extend through repeat regions during polishing steps (I.e. The EST evidence supports extension through the repeat and inclusion of the TE). --Carson From: Dario Copetti > Organization: AGI Date: Monday, 6 May, 2013 5:19 PM To: > Cc: "kapeel at cals.arizona.edu" >, "Stein, Joshua" >, Rod Wing > Subject: gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcopetti at cals.arizona.edu Mon May 6 15:19:42 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Mon, 06 May 2013 14:19:42 -0700 Subject: [maker-devel] gene models overlapping with TEs Message-ID: <51881E6E.9010202@cals.arizona.edu> Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gene_TE.jpg Type: image/jpeg Size: 177299 bytes Desc: not available URL: From myandell at genetics.utah.edu Mon May 6 21:47:49 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:47:49 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02CEE@mxb2.hg.genetics.utah.edu> could the TEs be in the UTRs? Also, maybe some of these are low complexity regions? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From myandell at genetics.utah.edu Mon May 6 21:49:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:49:51 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> humm, eballing then it doesn't look lie its the UTRss.. Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From carsonhh at gmail.com Tue May 7 05:39:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 07:39:17 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> Message-ID: If I had to guess. I imagine the EST evidence includes assembled mRNA-seq reads? Is that correct? --Carson On 13-05-06 11:49 PM, "Mark Yandell" wrote: >humm, eballing then it doesn't look lie its the UTRss.. > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: maker-devel-bounces at yandell-lab.org >[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >[dcopetti at cals.arizona.edu] >Sent: Monday, May 06, 2013 3:19 PM >To: maker-devel at yandell-lab.org >Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >Subject: [maker-devel] gene models overlapping with TEs > >Carson, > >Analyzing the output of a MAKER run on a rice-sized genome I noticed that >some gene models (~10%) overlap with TE coding regions. As a QC step, I >used BEDtools to determine the intersection of "CDS" and "repeatmasker" >or "repeatrunner" and some 2400 genes overlap for at least 30% of their >respective length. I am wondering how the gene models still appear in the >final output, since I thought that the masking step was giving us the >absoulte confirmation that in our endogenous gene list we do not include >TE coding regions. Here below an example of a gene (attached picture too): > >ObracChr10 maker mRNA 355,056 358,075 . - . >ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >ObracChr10 maker exon 355,056 356,874 . - . >ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >ObracChr10 maker exon 356,965 357,081 . - . >ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,209 357,319 . - . >ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,756 358,075 . - . >ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,756 358,075 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,209 357,319 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 356,965 357,081 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 355,056 356,874 . - 0 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 > > > > > > > > > > > > > > > > > > > > >ObracChr10 repeatrunner match_part 357,755 358,084 566 - > . >ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner protein_match 357,755 358,084 566 - > . >ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner match_part 357,202 357,294 142 - > . >ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner protein_match 357,202 357,294 142 - > . >ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner match_part 355,059 357,092 3367 - > . >ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - > . >ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 > > >This result is valid both for output lines from repeatmasker or >repeatrunner, and the gene models come from either FGENESH or SNAP >predictions. >How can I explain this problem? >Thanks, > >Dario > > > > > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jmdoyle at purdue.edu Tue May 7 09:12:38 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 11:12:38 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, Thanks for the quick reply! ?I don't remember any errors during Blast installation, it appeared to install fine with the ./Build command. ?Augustus, Repeatmasker and SNAP were the programs I could not install with the ./Build commands, and instead installed manually. ?I've attached the error messages I received when I tried to use the ./Build commands. ?I've tested out the three programs I installed manually and they seem to work fine on their own. I do have xcode installed. ?How would I determine if I have "non-standard libraries via Mac ports"? Thanks again for your help with this. Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: "Carson Holt" To: "Jacqueline R M Doyle" , maker-devel at yandell-lab.org Sent: Monday, May 6, 2013 10:02:52 AM Subject: Re: [maker-devel] MAKER installation debugging Most maker development and debugging actually happens on a Mac (OS X 10.7.5). ?Blast, Augustus, SNAP all install for me just fine with maker 2.27. ?What errors do you get during installation? ?Do you by any chance have non-standard libraries via Mac ports for example. ?Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). ?I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). ?I then attempted to test out maker following >the 2012 MAKER tutorial. ?I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. ?I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: repeatmasker installation error.rtf Type: application/rtf Size: 1264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: snap installation error.rtf Type: application/rtf Size: 1095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: agustus installation error.rtf Type: application/rtf Size: 1124 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 7 09:19:57 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 11:19:57 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From jmdoyle at purdue.edu Tue May 7 10:54:22 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 12:54:22 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <963584633.220449.1367945662870.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, I am using MAKER 2.10 (I downloaded it so long ago I'd forgotten there were two options). Would it be better for me to start over with 2.27? Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: Carson Holt To: Jacqueline R M Doyle Cc: maker-devel at yandell-lab.org Sent: Tue, 07 May 2013 11:19:57 -0400 (EDT) Subject: Re: [maker-devel] MAKER installation debugging Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue May 7 12:20:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 14:20:19 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51892ABA.2060100@cals.arizona.edu> Message-ID: This is really more of an evidence issue. Because you have assembled mRNAseq evidence, you are probably getting them improperly included in the assembled EST, so MAKER just follows the evidence. It tries to mask it out, but the alignment of the longer EST heavily supports the repeats inclusion in the model during alignment polishing. Solutions: 1. You can set softmask=0 instead of softmask=1 (1 is the default), to make everything hard masked instead (it will be a hard 'N' so no alignment can happen). 2. You can pre-mask the genome. Easiest way to do this would be to collect the query.masked.fasta files inside each theVoid directory in the datastore and use them as the input. Then none of the polishing steps can ever extend the alignment. 3. You can filter the mRNA-seq data fro TE elements before assembly. Thanks, Carson On 13-05-07 12:24 PM, "Dario Copetti" wrote: >Yes, there was RNA-seq evidence as well. Still I would like to have this >evidence annotated as TE, and not as a gene (or at least to have it >tagged in some way). > >As you suggested, a good solution could be to sequentially soft mask >with the RMasker output and then hard mask with the RRunner result. In >this way we hide TE coding regions from all predictors and alignments, >leaving all the other types of repeats softmasked. This meets Mark's >target of having MITEs and other non-autonomous TEs (as well as >simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my >opinion, this case could be one of the few cases (or the only one?) >where gene and repeat annotation can overlap. > >For our genomes I will have a list of these genes overlapping TE coding >regions, and we will likely remove them. Please let us know how you >intend to fix this problem and on which MAKER version it will appear. >Thanks for the assistance and suggestions, > >Dario > > > >On 05/07/2013 04:39 AM, Carson Holt wrote: >> If I had to guess. I imagine the EST evidence includes assembled >>mRNA-seq >> reads? Is that correct? >> >> --Carson >> >> >> >> On 13-05-06 11:49 PM, "Mark Yandell" wrote: >> >>> humm, eballing then it doesn't look lie its the UTRss.. >>> >>> Mark Yandell >>> Professor of Human Genetics >>> H.A. & Edna Benning Presidential Endowed Chair >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ph:801-587-7707 >>> >>> ________________________________________ >>> From: maker-devel-bounces at yandell-lab.org >>> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >>> [dcopetti at cals.arizona.edu] >>> Sent: Monday, May 06, 2013 3:19 PM >>> To: maker-devel at yandell-lab.org >>> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >>> Subject: [maker-devel] gene models overlapping with TEs >>> >>> Carson, >>> >>> Analyzing the output of a MAKER run on a rice-sized genome I noticed >>>that >>> some gene models (~10%) overlap with TE coding regions. As a QC step, I >>> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >>> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >>> respective length. I am wondering how the gene models still appear in >>>the >>> final output, since I thought that the masking step was giving us the >>> absoulte confirmation that in our endogenous gene list we do not >>>include >>> TE coding regions. Here below an example of a gene (attached picture >>>too): >>> >>> ObracChr10 maker mRNA 355,056 358,075 . - . >>> >>>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_ >>>eA >>> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >>> ObracChr10 maker exon 355,056 356,874 . - . >>> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 356,965 357,081 . - . >>> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,209 357,319 . - . >>> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,756 358,075 . - . >>> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,756 358,075 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,209 357,319 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 356,965 357,081 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 355,056 356,874 . - 0 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ObracChr10 repeatrunner match_part 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner protein_match 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner match_part 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner protein_match 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner match_part 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> >>> >>> This result is valid both for output lines from repeatmasker or >>> repeatrunner, and the gene models come from either FGENESH or SNAP >>> predictions. >>> How can I explain this problem? >>> Thanks, >>> >>> Dario >>> >>> >>> >>> >>> >>> -- >>> Dario Copetti, PhD >>> Research Associate >>> Arizona Genomics Institute >>> University of Arizona - BIO5 >>> >>> 1657 E. Helen St. >>> Tucson, AZ 85721 >>> www.genome.arizona.edu >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > From dcopetti at cals.arizona.edu Tue May 7 10:24:26 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Tue, 07 May 2013 09:24:26 -0700 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: References: Message-ID: <51892ABA.2060100@cals.arizona.edu> Yes, there was RNA-seq evidence as well. Still I would like to have this evidence annotated as TE, and not as a gene (or at least to have it tagged in some way). As you suggested, a good solution could be to sequentially soft mask with the RMasker output and then hard mask with the RRunner result. In this way we hide TE coding regions from all predictors and alignments, leaving all the other types of repeats softmasked. This meets Mark's target of having MITEs and other non-autonomous TEs (as well as simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my opinion, this case could be one of the few cases (or the only one?) where gene and repeat annotation can overlap. For our genomes I will have a list of these genes overlapping TE coding regions, and we will likely remove them. Please let us know how you intend to fix this problem and on which MAKER version it will appear. Thanks for the assistance and suggestions, Dario On 05/07/2013 04:39 AM, Carson Holt wrote: > If I had to guess. I imagine the EST evidence includes assembled mRNA-seq > reads? Is that correct? > > --Carson > > > > On 13-05-06 11:49 PM, "Mark Yandell" wrote: > >> humm, eballing then it doesn't look lie its the UTRss.. >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel-bounces at yandell-lab.org >> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >> [dcopetti at cals.arizona.edu] >> Sent: Monday, May 06, 2013 3:19 PM >> To: maker-devel at yandell-lab.org >> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >> Subject: [maker-devel] gene models overlapping with TEs >> >> Carson, >> >> Analyzing the output of a MAKER run on a rice-sized genome I noticed that >> some gene models (~10%) overlap with TE coding regions. As a QC step, I >> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >> respective length. I am wondering how the gene models still appear in the >> final output, since I thought that the masking step was giving us the >> absoulte confirmation that in our endogenous gene list we do not include >> TE coding regions. Here below an example of a gene (attached picture too): >> >> ObracChr10 maker mRNA 355,056 358,075 . - . >> ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >> ObracChr10 maker exon 355,056 356,874 . - . >> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 356,965 357,081 . - . >> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,209 357,319 . - . >> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,756 358,075 . - . >> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,756 358,075 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,209 357,319 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 356,965 357,081 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 355,056 356,874 . - 0 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ObracChr10 repeatrunner match_part 357,755 358,084 566 - >> . >> ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner protein_match 357,755 358,084 566 - >> . >> ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner match_part 357,202 357,294 142 - >> . >> ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner protein_match 357,202 357,294 142 - >> . >> ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner match_part 355,059 357,092 3367 - >> . >> ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - >> . >> ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> >> >> This result is valid both for output lines from repeatmasker or >> repeatrunner, and the gene models come from either FGENESH or SNAP >> predictions. >> How can I explain this problem? >> Thanks, >> >> Dario >> >> >> >> >> >> -- >> Dario Copetti, PhD >> Research Associate >> Arizona Genomics Institute >> University of Arizona - BIO5 >> >> 1657 E. Helen St. >> Tucson, AZ 85721 >> www.genome.arizona.edu >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From jmdoyle at purdue.edu Tue May 7 22:09:48 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Wed, 8 May 2013 00:09:48 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> I downloaded MAKER 2.27 and it installed perfectly! I worked through the tutorial without any problems. Thanks for your help with this! Best wishes, Jackie From carsonhh at gmail.com Tue May 7 22:10:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 08 May 2013 00:10:33 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: I'm glad it worked. --Carson On 13-05-08 12:09 AM, "Jacqueline R M Doyle" wrote: >I downloaded MAKER 2.27 and it installed perfectly! I worked through the >tutorial without any problems. Thanks for your help with this! > >Best wishes, Jackie > From Carson.Holt at oicr.on.ca Wed May 8 13:25:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 8 May 2013 19:25:52 +0000 Subject: [maker-devel] Non-standard genetic code In-Reply-To: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Message-ID: It's not possible yet. It is one of the things we have on a list to do. It's not a small task either to make the necessary changes to the code, as the codon usage affects blastx alignments, exonerate protein2genome alignments, ab initio gene prediction, gene extension/boundary polishing, and UTR addition. So the changes have to go into many many locations. Thanks, Carson On 13-05-08 11:44 AM, "Daniil Alexeyevsky" wrote: >Hi, > >I want to use MAKER to annotate organism with non-standard genetic >code. (It only has UGA stop-codon, UAA and UAG code glutamine). > >Is it possible to use MAKER in this case? If I am bound to editing some >source codes, could you please point me where to look? > >With best regards, >-- Daniil > From mnuhn at ebi.ac.uk Fri May 10 06:10:35 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 13:10:35 +0100 Subject: [maker-devel] Duplicated exons Message-ID: <518CE3BB.3060003@ebi.ac.uk> Hello Carson! I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 and then about four hundred lines later there are these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 which are identical except for the order number after "exon:". This seems to have happened to a lot of features in that file. How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? Cheers, Michael. From daa at fbb.msu.ru Wed May 8 09:44:50 2013 From: daa at fbb.msu.ru (Daniil Alexeyevsky) Date: Wed, 08 May 2013 19:44:50 +0400 Subject: [maker-devel] Non-standard genetic code Message-ID: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Hi, I want to use MAKER to annotate organism with non-standard genetic code. (It only has UGA stop-codon, UAA and UAG code glutamine). Is it possible to use MAKER in this case? If I am bound to editing some source codes, could you please point me where to look? With best regards, -- Daniil From diana_leduc at eva.mpg.de Fri May 10 08:44:50 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 16:44:50 +0200 (CEST) Subject: [maker-devel] Maker consensus Message-ID: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:13:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:13:33 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: I'm sorry I don?t' understand question 1. You are you missing resulting fasta files, correct? Did your resulting GFF3 file have any features of type "gene"? Did you run fasta_merge after running gff3_merge? Could you give me more details on what you are trying to do, so I can take a stab at question 2 as well. Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 10:44 AM To: Cc: Gabriel Renaud , Janet Kelso , Torsten Schoeneberg Subject: [maker-devel] Maker consensus Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:25:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:25:17 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518CE3BB.3060003@ebi.ac.uk> Message-ID: Very odd. Which version of MAEKR are you using. Are you using GFF3 passthrough in the run that generates the duplication? Thanks, Carson On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >Hello Carson! > >I have been trying to get to the bottom of an error message when >(re)training snap. Snap, or more precisely fathom, was giving me unclear >error messages about misordered and overlapping exons. > >I have looked into the gff files from which these exons originate and >noticed that a lot of exons in that file were duplicated. For example I >have found these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >and then about four hundred lines later there are these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >which are identical except for the order number after "exon:". > >This seems to have happened to a lot of features in that file. > >How can I avoid this? Or if this is just a rare problem, can I have >maker recompute the gff file without redoing all the computations again? > >Cheers, >Michael. > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.zohren at qmul.ac.uk Fri May 10 11:07:30 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Fri, 10 May 2013 18:07:30 +0100 Subject: [maker-devel] annotation of birch genome Message-ID: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using "mpiexec -n 20 maker") I've also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It's been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Fri May 10 11:35:37 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 18:35:37 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D2FE9.6080900@ebi.ac.uk> On 05/10/2013 05:25 PM, Carson Holt wrote: > Very odd. Which version of MAEKR are you using. Are you using GFF3 > passthrough in the run that generates the duplication? I am using version 2.27 of maker. I am not using the passthrough option. Cheers, Michael. > Thanks, > Carson > > > On 13-05-10 8:10 AM, "Michael Nuhn" wrote: > >> Hello Carson! >> >> I have been trying to get to the bottom of an error message when >> (re)training snap. Snap, or more precisely fathom, was giving me unclear >> error messages about misordered and overlapping exons. >> >> I have looked into the gff files from which these exons originate and >> noticed that a lot of exons in that file were duplicated. For example I >> have found these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> and then about four hundred lines later there are these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> which are identical except for the order number after "exon:". >> >> This seems to have happened to a lot of features in that file. >> >> How can I avoid this? Or if this is just a rare problem, can I have >> maker recompute the gff file without redoing all the computations again? >> >> Cheers, >> Michael. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri May 10 11:25:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:25:15 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Message-ID: Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:44:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:44:01 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: Message-ID: Also, if you will be annotating more genomes, you should look into getting allocation on your university's cluster. Queen Mary University has a 2000 cpu cluster. Most cluster managers bend over backwards to help Biologists use their systems as it looks good on progress reports and funding requests as they can show they have a broader user base (i.e. departments other than physics :-) --Carson From: Carson Holt Date: Friday, 10 May, 2013 1:25 PM To: Jasmin Zohren , Subject: Re: [maker-devel] annotation of birch genome Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 11:41:55 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 19:41:55 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > >, Janet Kelso < kelso at eva.mpg.de > >, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de > > > Subject: [maker-devel] Maker consensus > > > Dear maker developers, > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > I now have two questions: > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > > Thank you in advance. > > Best regards, > > Diana Le Duc > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:51:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:51:48 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Ok. You just ran the evidence and didn't give a gene predictor. You need to provide an HMM file for SNAP a species for augustus, or for rough annotations you can set protein3genome=1 and est2genome=1. This will try and generate models direct from the alignments. If you provide a gene predictor, then MAKER can talk to it about the evidence alignments so it can make a best gene call for the region. Then there will be gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and transcripts.fasta. If you need to train a predictor, you can train SNAP using the maker2zff script and the SNAP documentation or maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an excellent explanation as well as tools in a previous list message. list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html Script is in this github repo - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 1:41 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAM L1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG000 00000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00 000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org> > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < > kelso at eva.mpg.de>, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de> > Subject: [maker-devel] Maker consensus > > > > > > > > Dear maker developers, > > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > > I now have two questions: > > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 12:08:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:08:35 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D2FE9.6080900@ebi.ac.uk> Message-ID: 2.27 from the website download or the SVN devel version? Thanks, Carson On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >On 05/10/2013 05:25 PM, Carson Holt wrote: >> Very odd. Which version of MAEKR are you using. Are you using GFF3 >> passthrough in the run that generates the duplication? > >I am using version 2.27 of maker. I am not using the passthrough option. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >> >>> Hello Carson! >>> >>> I have been trying to get to the bottom of an error message when >>> (re)training snap. Snap, or more precisely fathom, was giving me >>>unclear >>> error messages about misordered and overlapping exons. >>> >>> I have looked into the gff files from which these exons originate and >>> noticed that a lot of exons in that file were duplicated. For example I >>> have found these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> and then about four hundred lines later there are these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> which are identical except for the order number after "exon:". >>> >>> This seems to have happened to a lot of features in that file. >>> >>> How can I avoid this? Or if this is just a rare problem, can I have >>> maker recompute the gff file without redoing all the computations >>>again? >>> >>> Cheers, >>> Michael. >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Fri May 10 12:29:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:29:32 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: You can use any species augustus already has. If it doesn't then you train it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH environmental variable, and is usually ?/augusts/config/species Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 2:16 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > > > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2au > gustus_gbk.pl > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1 > -2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000 > 000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT000000 > 00219|DSCAML1-2039 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> I'm sorry I don?t' understand question 1. You are you missing resulting >> fasta files, correct? Did your resulting GFF3 file have any features of type >> "gene"? Did you run fasta_merge after running gff3_merge? >> >> >> >> Could you give me more details on what you are trying to do, so I can take a >> stab at question 2 as well. >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 10:44 AM >> To: < maker-devel at yandell-lab.org> >> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de>, Torsten Schoeneberg < >> torsten.schoeneberg at medizin.uni-leipzig.de> >> Subject: [maker-devel] Maker consensus >> >> >> >> >> >> >> >> Dear maker developers, >> >> >> I am a phD student working on de novo assembly and annotation of a bird >> genome. I used Maker as annotation pipeline, which ran very well, and I >> obtained different annotations with evidence from Augustus gene predictor, >> small EST dataset from my organism and protein sequences from chicken, turkey >> and zebrafinch. I could combine the different gff files from different >> scaffolds into one gff file with annotations for the entire genome. >> >> >> I now have two questions: >> >> >> 1. What could be the reason that I haven't gotten the protein.fasta and >> trancript.fasta files >> >> >> 2. How can I obtain a consensus gene list of different evidences from maker? >> What I would actually need is the scaffold, coordinates and annotation (gene >> name) according to the 3 other bird species. >> Thank you in advance. >> >> >> >> Best regards, >> >> >> >> Diana Le Duc >> >> >> >> -- >> >> Max Planck Institute for Evolutionary Anthropology >> Department of Evolutionary Genetics >> Deutscher Platz 6 >> D-04103 Leipzig >> >> Phone +49 (0)341-3550-554 >> www.eva.mpg.de >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Fri May 10 18:29:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Sat, 11 May 2013 01:29:10 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D90D6.4080603@ebi.ac.uk> On 05/10/2013 07:08 PM, Carson Holt wrote: > 2.27 from the website download or the SVN devel version? SVN. I checked it out on 19/03/2013. > Thanks, > Carson > > > On 13-05-10 1:35 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 05:25 PM, Carson Holt wrote: >>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>> passthrough in the run that generates the duplication? >> >> I am using version 2.27 of maker. I am not using the passthrough option. >> >> Cheers, >> Michael. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>> >>>> Hello Carson! >>>> >>>> I have been trying to get to the bottom of an error message when >>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>> unclear >>>> error messages about misordered and overlapping exons. >>>> >>>> I have looked into the gff files from which these exons originate and >>>> noticed that a lot of exons in that file were duplicated. For example I >>>> have found these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> and then about four hundred lines later there are these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> which are identical except for the order number after "exon:". >>>> >>>> This seems to have happened to a lot of features in that file. >>>> >>>> How can I avoid this? Or if this is just a rare problem, can I have >>>> maker recompute the gff file without redoing all the computations >>>> again? >>>> >>>> Cheers, >>>> Michael. >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From dsth at ebi.ac.uk Fri May 10 18:20:42 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Sat, 11 May 2013 01:20:42 +0100 Subject: [maker-devel] Duplicated exons Message-ID: That is odd. i've run that version of maker 30-40x at ebi lately and never seen it. Is it just one scaffold? While i'd be surprised if it's the cause but have you been playing with the file locking options Carson mentioned a while back? I'd definitely be inclined to re-process it if it's just the one scaffold. Dan On May 10, 2013 12:45 PM, "Michael Nuhn" wrote: > > Hello Carson! > > I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. > > I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > and then about four hundred lines later there are these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > which are identical except for the order number after "exon:". > > This seems to have happened to a lot of features in that file. > > How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? > > Cheers, > Michael. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 12:16:34 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 20:16:34 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > Thank you for the quick answer. > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > 172 218;Gap=M47; > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > I hope this explains better what we are after. > > Thank you once again. > > Best regards, > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > I'm sorry I don?t' understand question 1. You are you missing > > > resulting fasta files, correct? Did your resulting GFF3 file have any > > > features of type "gene"? Did you run fasta_merge after running > > > gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take > > a stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 10:44 AM > > To: < maker-devel at yandell-lab.org > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > >, Janet Kelso < kelso at eva.mpg.de > > >, Torsten Schoeneberg < > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > Subject: [maker-devel] Maker consensus > > > > > > Dear maker developers, > > > > I am a phD student working on de novo assembly and annotation of a bird > > genome. I used Maker as annotation pipeline, which ran very well, and I > > obtained different annotations with evidence from Augustus gene predictor, > > small EST dataset from my organism and protein sequences from chicken, > > turkey and zebrafinch. I could combine the different gff files from > > different scaffolds into one gff file with annotations for the entire > > genome. > > > > I now have two questions: > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > trancript.fasta files > > > > 2. How can I obtain a consensus gene list of different evidences from > > maker? What I would actually need is the scaffold, coordinates and > > annotation (gene name) according to the 3 other bird species. > > > > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > > > Max Planck Institute for Evolutionary Anthropology > > Department of Evolutionary Genetics > > Deutscher Platz 6 > > D-04103 Leipzig > > > > Phone +49 (0)341-3550-554 > > www.eva.mpg.de > > _______________________________________________ maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From linzkl007 at hotmail.com Sat May 11 11:28:47 2013 From: linzkl007 at hotmail.com (=?gb2312?B?7OTs5A==?=) Date: Sun, 12 May 2013 01:28:47 +0800 Subject: [maker-devel] about predictor training Message-ID: Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sun May 12 21:53:34 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Mon, 13 May 2013 12:53:34 +0900 Subject: [maker-devel] exon numbering bug? Message-ID: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 12 22:01:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 00:01:41 -0400 Subject: [maker-devel] exon numbering bug? In-Reply-To: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Message-ID: There has been some post processing of the GFF3. It is not an original MAKER result file. I can tell based on the ID's (maker doesn't assign numerical IDs). Most likely it was processed to make exons unique without having dual parentage. Normally if the same exon is found in two transcripts it will have two parents separated by a comma. I imaging that the post processing script duplicated the exon, creating independent IDs and split the parents, but left the Name= tag the same. Since the Name= tag was based off of the first transcript the exon belonged to, it stayed the same. --Carson From: "Kang, Yang Jae" Date: Sunday, 12 May, 2013 11:53 PM To: Subject: [maker-devel] exon numbering bug? Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:00:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:00:01 -0400 Subject: [maker-devel] about predictor training In-Reply-To: Message-ID: You need to convert the GTF files to GFF3. There is a tophat2gff and cufflinks2gff script that come with MAKER. I recommend only using cufflinks results and ignoring tophat results though as they tend to be a lot more spurious. Jason Stajich wrote an excellent explanation on training Augustus on the list previously - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included scripts to assist with the training - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Overall the strategy is similar to the one used to train SNAP. Thanks, Carson From: ?? Date: Saturday, 11 May, 2013 1:28 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] about predictor training Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:01:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:01:58 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D90D6.4080603@ebi.ac.uk> Message-ID: Could you send me your maker opts files, the contig that fails, and the evidence files you use for that contig. Thanks, Carson On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >On 05/10/2013 07:08 PM, Carson Holt wrote: >> 2.27 from the website download or the SVN devel version? > >SVN. I checked it out on 19/03/2013. > >> Thanks, >> Carson >> >> >> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>> passthrough in the run that generates the duplication? >>> >>> I am using version 2.27 of maker. I am not using the passthrough >>>option. >>> >>> Cheers, >>> Michael. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>> >>>>> Hello Carson! >>>>> >>>>> I have been trying to get to the bottom of an error message when >>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>> unclear >>>>> error messages about misordered and overlapping exons. >>>>> >>>>> I have looked into the gff files from which these exons originate and >>>>> noticed that a lot of exons in that file were duplicated. For >>>>>example I >>>>> have found these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> and then about four hundred lines later there are these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> which are identical except for the order number after "exon:". >>>>> >>>>> This seems to have happened to a lot of features in that file. >>>>> >>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>> maker recompute the gff file without redoing all the computations >>>>> again? >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From mnuhn at ebi.ac.uk Mon May 13 10:30:36 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Mon, 13 May 2013 17:30:36 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <5191152C.5030103@ebi.ac.uk> Hello Carson! On 05/13/2013 03:01 PM, Carson Holt wrote: > Could you send me your maker opts files, the contig that fails, and the > evidence files you use for that contig. Thanks for offering your help. I worked around the problem this morning by removing all exons from the training set for which I was getting the error. Now I'm rerunning maker and I can't find any gff files at the moment with this problem. If the problem reappears, I'll send you the files. Cheers, Michael. > Thanks, > Carson > > > > On 13-05-10 8:29 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 07:08 PM, Carson Holt wrote: >>> 2.27 from the website download or the SVN devel version? >> >> SVN. I checked it out on 19/03/2013. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>> >>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>> passthrough in the run that generates the duplication? >>>> >>>> I am using version 2.27 of maker. I am not using the passthrough >>>> option. >>>> >>>> Cheers, >>>> Michael. >>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>> >>>>>> Hello Carson! >>>>>> >>>>>> I have been trying to get to the bottom of an error message when >>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>> unclear >>>>>> error messages about misordered and overlapping exons. >>>>>> >>>>>> I have looked into the gff files from which these exons originate and >>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>> example I >>>>>> have found these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> and then about four hundred lines later there are these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> which are identical except for the order number after "exon:". >>>>>> >>>>>> This seems to have happened to a lot of features in that file. >>>>>> >>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>> maker recompute the gff file without redoing all the computations >>>>>> again? >>>>>> >>>>>> Cheers, >>>>>> Michael. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>> g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon May 13 10:07:13 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 12:07:13 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <5191152C.5030103@ebi.ac.uk> Message-ID: Ok. Thanks, Carson On 13-05-13 12:30 PM, "Michael Nuhn" wrote: >Hello Carson! > >On 05/13/2013 03:01 PM, Carson Holt wrote: >> Could you send me your maker opts files, the contig that fails, and the >> evidence files you use for that contig. > >Thanks for offering your help. > >I worked around the problem this morning by removing all exons from the >training set for which I was getting the error. Now I'm rerunning maker >and I can't find any gff files at the moment with this problem. > >If the problem reappears, I'll send you the files. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> >> On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 07:08 PM, Carson Holt wrote: >>>> 2.27 from the website download or the SVN devel version? >>> >>> SVN. I checked it out on 19/03/2013. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>>> >>>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>>> passthrough in the run that generates the duplication? >>>>> >>>>> I am using version 2.27 of maker. I am not using the passthrough >>>>> option. >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>>> >>>>>>> Hello Carson! >>>>>>> >>>>>>> I have been trying to get to the bottom of an error message when >>>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>>> unclear >>>>>>> error messages about misordered and overlapping exons. >>>>>>> >>>>>>> I have looked into the gff files from which these exons originate >>>>>>>and >>>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>>> example I >>>>>>> have found these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> and then about four hundred lines later there are these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> which are identical except for the order number after "exon:". >>>>>>> >>>>>>> This seems to have happened to a lot of features in that file. >>>>>>> >>>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>>> maker recompute the gff file without redoing all the computations >>>>>>> again? >>>>>>> >>>>>>> Cheers, >>>>>>> Michael. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>or >>>>>>> g >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From rob.syme at gmail.com Tue May 14 00:54:18 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 14 May 2013 14:54:18 +0800 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Message-ID: Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz, making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 14 05:20:00 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 May 2013 07:20:00 -0400 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy In-Reply-To: Message-ID: You have to use MPICH2, the new MPICH3 is not compatible. MPI version 3 is a completely new protocol implemented in MPICH3, and it breaks MAKER. You can also use OpenMPI with the MAKER version 2.27. Thanks, Carson From: Rob Syme Date: Tuesday, 14 May, 2013 2:54 AM To: Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz , making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Tue May 14 14:42:33 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 14 May 2013 20:42:33 +0000 Subject: [maker-devel] MPI MAKER hanging NFS Message-ID: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1 with RHEL 6.3. Questions: (1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues. (2) Has anyone else seen this? (3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected? I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks. Thanks, Todd Heywood CSHL From Carson.Holt at oicr.on.ca Tue May 14 19:01:00 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 01:01:00 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > From eernst at cshl.edu Wed May 15 11:08:08 2013 From: eernst at cshl.edu (Evan Ernst) Date: Wed, 15 May 2013 13:08:08 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt wrote: > No it does not use ROMIO. > > The locking may be do to how your NFS is implemented. MAKER does a lot of > small writes. Some NFS implementations do not handle that well and only > like large infrequent writes and frequent reads? > MAKER also uses a variant of the File:::NFSLock module which uses > hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > enabled (described here > http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > I know that the FhGFS implementation of NFS has broken hard link > functionality. > > > Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > mounted location. It must be local (/tmp for example). This is because > certain types of operations are not always NFS safe and need a local > location to work with (anything involving berkley DB or SQLite for > example). Make sure you are not setting that to an NFS mounted scratch > location. The mpi**** files, are examples of some short lived files that > should not be in NFS. They hold chunks of data from threads that are > processing the genome and are very rapidly created and deleted. They will > be cleaned up automatically when maker finished or killed by standard > signals such as when you hit ^C or use kill 15. > > > Thanks, > Carson > > > > > On 13-05-14 4:42 PM, "Heywood, Todd" wrote: > > >We have been getting hung NFS mounts on some nodes when running MPI MAKER > >(version 2.27). Processes go into a "D" state and cannot be killed. We > >end up having to reboot nodes to recover them. We are running MPICH2 > >version 1.4.1p1 > >with RHEL 6.3. Questions: > > > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >on a sync_page system call under NFS. That *might* imply some locking > >issues. > > > >(2) Has anyone else seen this? > > > >(3) The root directory (parent of genome.maker.output directory) has lots > >of mpi***** files, all of which have the first line > >"pst0Process::MpiChunk". Is this expected? > > > >I'm able to reproducibly hang NFS on some nodes when using at least 4 > >32-core nodes and 128 running MPI tasks. > > > >Thanks, > > > >Todd Heywood > >CSHL > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 15 11:15:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 17:15:52 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Thu May 16 10:08:43 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Thu, 16 May 2013 17:08:43 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195048B.9080707@ebi.ac.uk> Hi Carson, When I was trying to load the Maker-2.27 results into ensembl, I found that few hundreds of genes with 'duplicate exons' . When I looked in the gff file, I found cases like this, where the exons are not actually duplicated but have two Parents with same mRNA ID. This can be a potential alternate transcript, attached to the same transcript by mistake? Many thanks Uma 3 maker gene 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179 3 maker mRNA 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_AED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 3 maker exon 524271 524480 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524538 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 524903 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 525182 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524271 524480 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524904 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 From carsonhh at gmail.com Thu May 16 10:13:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:13:05 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I've had one other report of this on the devel list, but haven't gotten data to test with. Do you have the run files that produced the duplicate exon? If so, cCould you send me theVoid directory for the contig that shows the dulicate, and the maker_opts.ctl file? Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 16 10:25:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:25:36 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I think this also may be a result of using GFF3 pass-through. So if that is the case, could you send me any GFF3 files you gave maker in addition to the other files I asked for. Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dsth at ebi.ac.uk Thu May 16 10:38:35 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 16 May 2013 17:38:35 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: <5195048B.9080707@ebi.ac.uk> Message-ID: hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 16 10:50:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:50:50 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: Message-ID: Yes. Perhaps this is the same issue Michael saw, although the one difference I see from his post is the Parent= attribute. --> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-proce ssed-gene-6.179-mRNA-1 I have seen duplicate exons from GFF3 pass-through in the past, but if that's not being used I'd be very appreciative of any test dataset you could give me. Thanks, Carson From: Daniel Hughes Date: Thursday, 16 May, 2013 12:38 PM To: Carson Holt Cc: Uma Maheswari , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] duplicate exons? hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ---------------------------------------------------------------------------- --------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I found >> >that few hundreds of genes with 'duplicate exons' . When I looked in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Fri May 17 02:41:56 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Fri, 17 May 2013 09:41:56 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195ED54.4090501@ebi.ac.uk> Hi Carson, I checked with Michael, this is different from what he saw, he had entire segements of gff files duplicated, In this case, just Parent id is. I am preparing the files you asked for, will send them soon thanks Uma On 16/05/13 17:50, Carson Holt wrote: > Yes. Perhaps this is the same issue Michael saw, although the one > difference I see from his post is the Parent= attribute. > > --> > Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 > > I have seen duplicate exons from GFF3 pass-through in the past, but if > that's not being used I'd be very appreciative of any test dataset you > could give me. > > Thanks, > Carson > > > > > From: Daniel Hughes > > Date: Thursday, 16 May, 2013 12:38 PM > To: Carson Holt > > Cc: Uma Maheswari >, > "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] duplicate exons? > > hiya, are you using the same instance as michael at ebi as this sounds > like the same problem he had last week and he wasn't running pass > through. i've run 2.27 here 30+ times here and not seen this? is > something very strange corrupted? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/16 Carson Holt > > > I think this also may be a result of using GFF3 pass-through. So > if that > is the case, could you send me any GFF3 files you gave maker in > addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" > wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked > in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.abriata at epfl.ch Fri May 17 03:45:41 2013 From: luciano.abriata at epfl.ch (Luciano Abriata) Date: Fri, 17 May 2013 09:45:41 +0000 Subject: [maker-devel] getting protein sequences from genomes Message-ID: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: 1) How can I match proteins predicted for the same gene in two genomes? 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta Which of these files contains the definite set of predicted protein sequences? Thanks in advance! Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Fri May 17 07:25:16 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Fri, 17 May 2013 13:25:16 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> It appears that a kernel bug caused the NFS hang, at least for limlted scale testing (6 nodes, 192 tasks). I upgraded the kernel from 2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and cannot reproduce the hangs. As far a TMPDIR, I'm not really sure I understand. We use SGE, and the TMPDIR we are referring to is set by SGE within a job to be /tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? Todd From: Carson Holt > Date: Wednesday, May 15, 2013 1:15 PM To: "Ernst, Evan" > Cc: Todd Heywood >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Fri May 17 07:40:50 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Fri, 17 May 2013 13:40:50 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> Message-ID: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt > >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" > >Cc: Todd Heywood >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst > >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt > >Cc: "Heywood, Todd" >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From barry.moore at genetics.utah.edu Fri May 17 13:02:31 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 17 May 2013 13:02:31 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Message-ID: On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? blastp tweaked with parameters to optimize near perfect match > > 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. QI values are as follows: 5' UTR Length Fraction of splice sites confirmed by EST alignment. Fraction of exons that overlap and EST alignment. Fraction of exons that overlap EST or protein alignment. Fraction of splice sites confirmed by an ab initio prediction. Fraction of exons that overlap an ab intitio prediction. Number of exons in the transcript. 3' UTR length. Length of encoded protein. > 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta > > Which of these files contains the definite set of predicted protein sequences? The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Sun May 19 22:16:10 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Mon, 20 May 2013 12:16:10 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Message-ID: Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 14:36:38 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 16:36:38 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> References: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> Message-ID: Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt > > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" > > >Cc: Todd Heywood >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst > > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt > > >Cc: "Heywood, Todd" >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > >> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2013-05-20 at 4.14.09 PM.png Type: image/png Size: 22634 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 20 17:50:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 19:50:28 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , "Heywood, Todd" Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > >> >It appears that a kernel bug caused the NFS hang, at least for limlted >> >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >> >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >> >cannot reproduce the hangs. >> > >> >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >> >TMPDIR we are referring to is set by SGE within a job to be >> >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> > >> >Todd >> > >> > >> > >> > >> >From: Carson Holt > >> >Date: Wednesday, May 15, 2013 1:15 PM >> >To: "Ernst, Evan" > >> >Cc: Todd Heywood >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >The mpi**** files should be generated in the $TMPDIR or TMP= location. >> >If they are happening in the working directory, then there is a problem. >> >If you are not setting TMP=, perhaps TMPDIR is not being exported when >> >'mpiexec' is launched. You may have to manually specify that it needs to >> >be exported to the other nodes using the mpiexec command line flags. >> >OpenMPI for example does not export all environmental variables by >> >default to the other nodes. >> > >> >Thanks, >> >Carson >> > >> > >> > >> >From: Evan Ernst > >> >Date: Wednesday, 15 May, 2013 1:08 PM >> >To: Carson Holt > >> >Cc: "Heywood, Todd" >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >Hi Carson, >> > >> >For these runs, -TMP is set to the $TMPDIR environment variable via maker >> >command line argument in the cluster job script to use the local disk on >> >each node. We can see files being generated in those locations on each >> >node, so it seems this is working as expected. >> > >> >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >> >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >> >onto a different, faster nfs mount than the working dir where the mpi**** >> >files are being written. >> > >> >Thanks, >> >Evan >> > >> > >> > >> >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> >> wrote: >> >No it does not use ROMIO. >> > >> >The locking may be do to how your NFS is implemented. MAKER does a lot of >> >small writes. Some NFS implementations do not handle that well and only >> >like large infrequent writes and frequent reads? >> >MAKER also uses a variant of the File:::NFSLock module which uses >> >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >> >enabled (described here >> >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >> >I know that the FhGFS implementation of NFS has broken hard link >> >functionality. >> > >> > >> >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >> >mounted location. It must be local (/tmp for example). This is because >> >certain types of operations are not always NFS safe and need a local >> >location to work with (anything involving berkley DB or SQLite for >> >example). Make sure you are not setting that to an NFS mounted scratch >> >location. The mpi**** files, are examples of some short lived files that >> >should not be in NFS. They hold chunks of data from threads that are >> >processing the genome and are very rapidly created and deleted. They will >> >be cleaned up automatically when maker finished or killed by standard >> >signals such as when you hit ^C or use kill 15. >> > >> > >> >Thanks, >> >Carson >> > >> > >> > >> > >> >On 13-05-14 4:42 PM, "Heywood, Todd" >> >> wrote: >> > >>> >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>> >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>> >>end up having to reboot nodes to recover them. We are running MPICH2 >>> >>version 1.4.1p1 >>> >>with RHEL 6.3. Questions: >>> >> >>> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>> >>on a sync_page system call under NFS. That *might* imply some locking >>> >>issues. >>> >> >>> >>(2) Has anyone else seen this? >>> >> >>> >>(3) The root directory (parent of genome.maker.output directory) has lots >>> >>of mpi***** files, all of which have the first line >>> >>"pst0Process::MpiChunk". Is this expected? >>> >> >>> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>> >>32-core nodes and 128 running MPI tasks. >>> >> >>> >>Thanks, >>> >> >>> >>Todd Heywood >>> >>CSHL >>> >> >>> >> >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 18:20:22 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 20:20:22 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: Message-ID: /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt wrote: > Could you run the following command for me and share the ouptut with me? > > mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > > Thanks, > Carson > > > > From: Evan Ernst > > Date: Monday, 20 May, 2013 4:36 PM > To: Carson Holt > > Cc: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, > "Heywood, Todd" > > Subject: Re: [maker-devel] MPI MAKER hanging NFS > > Hi Carson, > > The SGE launch script looks like this (sans SGE args): > > mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl > maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > > Snooping on the running jobs (see attached image), it looks like $TMPDIR > is evaluated to a local directory by the shell of the MPI master node as > intended, so the evaluated path, not the env var reference, is being passed > to the MPI workers. > > Despite this, the mpi*** files are still being created in the working > directory. > > If I understand correctly, these mpi*** files are meant to be written to > the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), > which should be equivalent, but this doesn't seem to be the case. > > Thanks, > Evan > > > > > On Fri, May 17, 2013 at 9:40 AM, Carson Holt > wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" heywood at cshl.edu>> wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt >>> > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" eernst at cshl.edu>> > >Cc: Todd Heywood heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst eernst at cshl.edu>> > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt >>> > >Cc: "Heywood, Todd" heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > > Carson.Holt at oicr.on.ca>> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > maker-devel at box290.bluehost.com>> > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 20 18:38:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 20:38:41 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> Message-ID: It may have just been a random failure. Try launching it again. Basically one instance failed to launch hydra_pmi_proxy which wraps the command being called via mpiexec. So you get 7 lines of output instead of the 8 that should be there. --Carson On 13-05-20 8:33 PM, "Heywood, Todd" wrote: >All starter_with_limit.sh does is set a ulimit for the top process for >the job, then start it passing all parameters: > >#!/bin/sh >ulimit -c 0 >exec $* > > >From: Evan Ernst > >Date: Monday, May 20, 2013 8:20 PM >To: Carson Holt > >Cc: Carson Holt >, >"maker-devel at yandell-lab.org" >>, Todd >Heywood > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/opt/uge/default/common/starter_with_limit.sh: line 4: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": No such file or directory >/opt/uge/default/common/starter_with_limit.sh: line 4: exec: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": cannot execute: No such file or directory > > >Todd, are these errors from the starter_with_limit.sh wrapper harmless? > >Thanks, >Evan > > >On Mon, May 20, 2013 at 7:50 PM, Carson Holt >> wrote: >Could you run the following command for me and share the ouptut with me? > >mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > >Thanks, >Carson > > > >From: Evan Ernst >nst at cshl.edu>>> >Date: Monday, 20 May, 2013 4:36 PM >To: Carson Holt >oicr.on.ca>> >Cc: >"maker-devel at yandell-lab.orgker-devel at yandell-lab.org>" >ker-devel at yandell-lab.org>>, >"Heywood, Todd" >heywood at cshl.edu>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >The SGE launch script looks like this (sans SGE args): > >mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl >maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > >Snooping on the running jobs (see attached image), it looks like $TMPDIR >is evaluated to a local directory by the shell of the MPI master node as >intended, so the evaluated path, not the env var reference, is being >passed to the MPI workers. > >Despite this, the mpi*** files are still being created in the working >directory. > >If I understand correctly, these mpi*** files are meant to be written to >the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), >which should be equivalent, but this doesn't seem to be the case. > >Thanks, >Evan > > > > >On Fri, May 17, 2013 at 9:40 AM, Carson Holt >oicr.on.ca>> wrote: >I'm glad your getting better results. > >With respect to environmental variables. One common error in MPI >execution is that the environment variables will not always be the same on >the other nodes since only the root node is attached to a terminal, so >variables in launch scripts (.bashrc etc.) may not be available on all >nodes. Many clusters that are part of the XSEDE network and use SGE for >example have scripts that wrap mpiexec to guarantee export of all >environmental variables when using MPI to avoid just this type of common >error. So like anything, you start with the most common cause of errors >and then work to the less common. Kernel bugs usually rank low on the >list :-) But I'm glad it's working for you now. > >Thanks, >Carson > > > > > >On 13-05-17 9:25 AM, "Heywood, Todd" >heywood at cshl.edu>>> wrote: > >>It appears that a kernel bug caused the NFS hang, at least for limlted >>scale testing (6 nodes, 192 tasks). I upgraded the kernel from >>2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >>cannot reproduce the hangs. >> >>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >>TMPDIR we are referring to is set by SGE within a job to be >>/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> >>Todd >> >> >> >> >>From: Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> >>Date: Wednesday, May 15, 2013 1:15 PM >>To: "Ernst, Evan" >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Cc: Todd Heywood >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>The mpi**** files should be generated in the $TMPDIR or TMP= location. >>If they are happening in the working directory, then there is a problem. >>If you are not setting TMP=, perhaps TMPDIR is not being exported when >>'mpiexec' is launched. You may have to manually specify that it needs to >>be exported to the other nodes using the mpiexec command line flags. >>OpenMPI for example does not export all environmental variables by >>default to the other nodes. >> >>Thanks, >>Carson >> >> >> >>From: Evan Ernst >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Date: Wednesday, 15 May, 2013 1:08 PM >>To: Carson Holt >>>@oicr.on.ca>>>on.holt at oicr.on.ca>>>> >>Cc: "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>Hi Carson, >> >>For these runs, -TMP is set to the $TMPDIR environment variable via maker >>command line argument in the cluster job script to use the local disk on >>each node. We can see files being generated in those locations on each >>node, so it seems this is working as expected. >> >>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >>relevant, but I'm also setting mpi_blastdb= to consolidate the databases >>onto a different, faster nfs mount than the working dir where the mpi**** >>files are being written. >> >>Thanks, >>Evan >> >> >> >>On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> wrote: >>No it does not use ROMIO. >> >>The locking may be do to how your NFS is implemented. MAKER does a lot >>of >>small writes. Some NFS implementations do not handle that well and only >>like large infrequent writes and frequent reads? >>MAKER also uses a variant of the File:::NFSLock module which uses >>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >>enabled (described here >>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >>I know that the FhGFS implementation of NFS has broken hard link >>functionality. >> >> >>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >>mounted location. It must be local (/tmp for example). This is because >>certain types of operations are not always NFS safe and need a local >>location to work with (anything involving berkley DB or SQLite for >>example). Make sure you are not setting that to an NFS mounted scratch >>location. The mpi**** files, are examples of some short lived files that >>should not be in NFS. They hold chunks of data from threads that are >>processing the genome and are very rapidly created and deleted. They >>will >>be cleaned up automatically when maker finished or killed by standard >>signals such as when you hit ^C or use kill 15. >> >> >>Thanks, >>Carson >> >> >> >> >>On 13-05-14 4:42 PM, "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>> wrote: >> >>>We have been getting hung NFS mounts on some nodes when running MPI >>>MAKER >>>(version 2.27). Processes go into a "D" state and cannot be killed. We >>>end up having to reboot nodes to recover them. We are running MPICH2 >>>version 1.4.1p1 >>>with RHEL 6.3. Questions: >>> >>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>>on a sync_page system call under NFS. That *might* imply some locking >>>issues. >>> >>>(2) Has anyone else seen this? >>> >>>(3) The root directory (parent of genome.maker.output directory) has >>>lots >>>of mpi***** files, all of which have the first line >>>"pst0Process::MpiChunk". Is this expected? >>> >>>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>>32-core nodes and 128 running MPI tasks. >>> >>>Thanks, >>> >>>Todd Heywood >>>CSHL >>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com>ailto:maker-devel at box290.bluehost.com>com>>>uehost.com>>290.bluehost.com>>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ maker-devel mailing list >maker-devel at box290.bluehost.comilto:maker-devel at box290.bluehost.comm>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From heywood at cshl.edu Mon May 20 18:33:32 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:33:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters: #!/bin/sh ulimit -c 0 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From heywood at cshl.edu Mon May 20 18:34:48 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:34:48 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130FB@ex-hs-mbx06.cshl.edu> Actually, line 4 is the exec (one line is commented out): #!/bin/sh ulimit -c 0 #ulimit -n 262144 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Mon May 20 18:48:32 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 21 May 2013 00:48:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you use the attached file to replace maker/src/bin/maker and maker/bin/maker? You will have to rerun 'maker/src/Build install' or just edit the shebang line (#!) if perl is located anywhere other than /usr/bin/perl. I explicitly tell it to use the system TMPDIR rather than letting it get set implicitly. See if that stops the mpi***** files in the working directory. It's always possible that this is just a slight difference in behavior for the version of the File::Temp module that is packaged with your perl. --Carson From: Evan Ernst > Date: Monday, 20 May, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, "Heywood, Todd" > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker Type: application/octet-stream Size: 49266 bytes Desc: maker URL: From carsonhh at gmail.com Mon May 20 19:08:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 21:08:51 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: On default settings MAKER will only put ab initio predictions that have some sort of evidence support (EST or protein) in the final gene set. The rejected predictions are still in the GFF3 for reference purposes as match/match_part features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might be why it is not there. You can add all rejected models that don't overlap an accepted model by setting keep_preds=1 (this usually brings a lot more into the final gene set than you really want though (lots of false positives). But for some organisms like fungi, which have high gene densities, this approach is relatively safe. Alternatively the gene is missing because it overlaps another gene model that was accepted. MAKER won't allow overlapping models on the same strand in eukaryotes. The only way to force that kind of overlap is to give MAKER the reference models in model_gff and not let it call it's own models (then maker is really just aligning evidence and scoring the reference models). One final note. If there is no evidence supporting the model, and that is why it is rejected, you can also try adding more evidence to the maker run or you can consider the possibility that the gene model in the reference is not real to being with (i.e. a false positive gene model called during the initial annotation process and not supported by protein or expression data from any source). Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 12:16 AM To: Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Mon May 20 19:19:20 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Tue, 21 May 2013 09:19:20 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: References: Message-ID: Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have > some sort of evidence support (EST or protein) in the final gene set. The > rejected predictions are still in the GFF3 for reference purposes as > match/match_part features, but not as gene/mRNA/exon/CDS features. So a > lack of evidence might be why it is not there. You can add all rejected > models that don't overlap an accepted model by setting keep_preds=1 (this > usually brings a lot more into the final gene set than you really want > though (lots of false positives). But for some organisms like fungi, which > have high gene densities, this approach is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model > that was accepted. MAKER won't allow overlapping models on the same strand > in eukaryotes. The only way to force that kind of overlap is to give MAKER > the reference models in model_gff and not let it call it's own models (then > maker is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is > why it is rejected, you can also try adding more evidence to the maker run > or you can consider the possibility that the gene model in the reference is > not real to being with (i.e. a false positive gene model called during the > initial annotation process and not supported by protein or expression data > from any source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present > in the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure > were absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Mon May 20 23:57:19 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 21 May 2013 13:57:19 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column Message-ID: Hi all By my reading of the GFF3 spec ( http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 01:36:37 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Tue, 21 May 2013 07:36:37 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue May 21 17:54:40 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 21 May 2013 17:54:40 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Tue May 21 09:58:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Tue, 21 May 2013 10:58:43 -0500 Subject: [maker-devel] Maker: Re-annotation Message-ID: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 18:59:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 20:59:09 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. > Maker has been chosen as the preliminary tool for the task. By checking the > annotation results by using maker 2.10, we saw some loci have the fusion > problem: two separate neighbour genes are likely to be fused together and > regarded as a single candidate output by maker. If we go further by looking at > the outputs from each individual de novo algorithm, e.g. augustus or snap, the > prediction was correct. We are also using RNA-Seq assembly from cufflinks and > some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and > ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when > to merge models or not. If possible, can you please explain what these two > parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to > predict UTRs, enlarge the protein evidence datasets and input our previous > annotations as model_gff. We are facing with an critical question: in which > way we could effectively improve the gene fusing problem? 1) setting the > pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything > else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 19:23:48 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 01:23:48 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:37:02 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:37:02 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:39:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:39:01 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 20:23:26 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 02:23:26 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Thank you Carson. It has been a very helpful conversation with you! I will pass these information back to our group. Best regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 11:39 AM To: Li, Sean (CMIS, Acton); barry.moore at genetics.utah.edu Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: > Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt >, Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:28:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:28:46 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: The option in trinity is --jaccard_clip --> http://trinityrnaseq.sourceforge.net/#jaccard_clip --Carson From: Innocent Onsongo Date: Tuesday, 21 May, 2013 11:58 AM To: Subject: [maker-devel] Maker: Re-annotation Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:32:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:32:54 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: It looks like the phase was calculated from the wrong strand orientation. I believe I have corrected this now. I'm checking a few more things, but I'll have 2.28 as the latest release likely tomorrow with the cumulative bug fixes since the last release. Thanks, Carson From: Rob Syme Date: Tuesday, 21 May, 2013 1:57 AM To: Subject: [maker-devel] Maker-derived CDS GFF3 phase column Hi all By my reading of the GFF3 spec (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:37:30 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:37:30 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <2024BE21-4293-4E9D-BE13-92774C7BC96D@gmail.com> Sean, The Trinity option to manage fusion transcripts is --jaccard_clip and is described here: http://trinityrnaseq.sourceforge.net/#jaccard_clip Trinity has also added functionality to use a hybrid reference-guided/de-novo assembly approach which you might also consider: http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html B On May 21, 2013, at 7:37 PM, Carson Holt wrote: > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > > No. Trinity would probably be a better approach to avoid merging. > > > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). > > --Carson > > > > > > From: > Date: Tuesday, 21 May, 2013 9:23 PM > To: Carson Holt , Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. > > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > Regards, > Sean > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Wednesday, 22 May 2013 10:59 AM > To: Barry Moore; Li, Sean (CMIS, Acton) > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. > > I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). > > Thanks, > Carson > > > From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM > To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Hi Sean, > > I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. > > Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. > > B > > On May 21, 2013, at 1:36 AM, > wrote: > > > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:43:47 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:43:47 -0600 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Tue May 21 22:04:04 2013 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 22 May 2013 12:04:04 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: References: Message-ID: Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. > I believe I have corrected this now. I'm checking a few more things, but > I'll have 2.28 as the latest release likely tomorrow with the cumulative > bug fixes since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec ( > http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from > Maker that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon", and for "reverse strand features, phase is counted from the end > field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) > is 5953. > The base at the end field is the first base of the translated CDS, so > there should be no bases removed "to reach the first base of the next > codon". I suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon" is measured from the 'left-hand' end of this feature (the start > field) rather than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:50:26 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:50:26 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: One other thing, I ran MAKER with the RM_off flag (maker -f -RM_off -q) the input sequences had already been masked. On Wed, May 22, 2013 at 7:47 AM, Innocent Onsongo wrote: > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it > will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:47:30 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:47:30 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > Hi Getiria, > > Does the MAKER produced GFF3 file contain any annotations at all? Can you > send the first ~100 lines each of the MAKER produced GFF3 file and of the > GFF3 files that you passed via maker_opts.ctl? > > B > > On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from > Augustus. We had previously used Augustus for gene prediction but now want > to combine these annotations with some EST data. I updated > fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was > successful. However, it also gives a "Segmentation fault (core dumped)" > message. It does produce a GFF3 file but when I load the GFF3 file into IGV > and look it does not contain any of the exon definitions in Augustus.gff3. > Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AugustusFirst100.gff3 Type: application/octet-stream Size: 9702 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS01058First100.gff Type: application/octet-stream Size: 5664 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CpG_IslandsFirst100.gff3 Type: application/octet-stream Size: 1963 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EST2ScaffoldFirst100.gff3 Type: application/octet-stream Size: 9900 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4578 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PromotersFirst100.gff3 Type: application/octet-stream Size: 112 bytes Desc: not available URL: From Carson.Holt at oicr.on.ca Wed May 22 08:03:14 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 22 May 2013 14:03:14 +0000 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Are you using MAKER version 2.10? I ask because there is in issue with other_gff in that version that has since been fixed. So if you don't get other_gff to pass-through, you will need to upgrade to 2.28 (release date is later today coincidentally). For the Augustus GFF3 file, the format is a little weird which is causing the problem. They are mRNA features not attached to genes. Rather than build the expected 3 level gene/mRNA/exon structure for these, it is simpler just to convert it to the 2 level match/match_part structure. Just convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your done so that it will force rebuild of the GFF3 database when you run again. Thanks, Carson From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore > wrote: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:38:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:38:50 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: I've released 2.28 on the website. This is one of the bugs that was fixed. It happens under a very specific set of circumstances. You need to run maker with the -a command line flag to get it to recalculate upstream variables after upgrading. Alternatively you can also just give maker your old GFF3 file (make all other options blank exempt for the *_pass= options), and maker will just rebuild it. Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 9:19 PM To: Carson Holt Cc: Subject: Re: [maker-devel] Why are some complete gene predictions not present in the final results? Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have some > sort of evidence support (EST or protein) in the final gene set. The rejected > predictions are still in the GFF3 for reference purposes as match/match_part > features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might > be why it is not there. You can add all rejected models that don't overlap an > accepted model by setting keep_preds=1 (this usually brings a lot more into > the final gene set than you really want though (lots of false positives). But > for some organisms like fungi, which have high gene densities, this approach > is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model that > was accepted. MAKER won't allow overlapping models on the same strand in > eukaryotes. The only way to force that kind of overlap is to give MAKER the > reference models in model_gff and not let it call it's own models (then maker > is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is why > it is rejected, you can also try adding more evidence to the maker run or you > can consider the possibility that the gene model in the reference is not real > to being with (i.e. a false positive gene model called during the initial > annotation process and not supported by protein or expression data from any > source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present in > the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure were > absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:39:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:39:53 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: Ok. It's available for download. --Carson From: Rob Syme Date: Wednesday, 22 May, 2013 12:04 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Maker-derived CDS GFF3 phase column Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. I > believe I have corrected this now. I'm checking a few more things, but I'll > have 2.28 as the latest release likely tomorrow with the cumulative bug fixes > since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec > (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker > that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next codon", > and for "reverse strand features, phase is counted from the end field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is > 5953. > The base at the end field is the first base of the translated CDS, so there > should be no bases removed "to reach the first base of the next codon". I > suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed from > the beginning of this feature to reach the first base of the next codon" is > measured from the 'left-hand' end of this feature (the start field) rather > than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Thu May 23 10:40:23 2013 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 23 May 2013 10:40:23 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch>, <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> Message-ID: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Hi Liciano, If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. B On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > Thanks for your reply! > > One more question, can you think of any tips to get the best possible predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? > > Thanks again, > > Luciano > > De: Barry Moore [barry.moore at genetics.utah.edu] > Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. > Para: Luciano Abriata > Cc: maker-devel at yandell-lab.org > Asunto: Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > >> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >> >> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >> >> 1) How can I match proteins predicted for the same gene in two genomes? > > blastp tweaked with parameters to optimize near perfect match > >> >> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >> >> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >> > > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. > > QI values are as follows: > > 5' UTR Length > Fraction of splice sites confirmed by EST alignment. > Fraction of exons that overlap and EST alignment. > Fraction of exons that overlap EST or protein alignment. > Fraction of splice sites confirmed by an ab initio prediction. > Fraction of exons that overlap an ab intitio prediction. > Number of exons in the transcript. > 3' UTR length. > Length of encoded protein. > > >> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >> >> Which of these files contains the definite set of predicted protein sequences? > > The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > >> >> >> >> Thanks in advance! >> >> Luciano >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Thu May 23 10:48:05 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 23 May 2013 17:48:05 +0100 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and > Augustus predictions as well as the predictions. If so, you don't want to > do that. An overlapping protein evidence is sufficient to promote a > prediction to an annotation, so by providing the protein translation of the > prediction along with the prediction you will guarantee that every > prediction will become an annotation and that means you lose the benefit of > evidence supervised annotation that MAKER provides. Include the proteins > from the D mel reference and if you want to cast a broader net include > proteins from other dipterans or even Uniprot - just depend on how > aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > > Thanks for your reply! > > One more question, can you think of any tips to get the best possible > predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be > real and don't exist if I blast them, plus a few others which don't start > with Methionine... So far I am including transcripts and translations from > flybase, and snap and augustus with their available trainings for flies. Do > you see any possible source of error in that? > > Thanks again, > > Luciano > > ------------------------------ > *De:* Barry Moore [barry.moore at genetics.utah.edu] > *Enviado el:* viernes, 17 de mayo de 2013 09:02 p.m. > *Para:* Luciano Abriata > *Cc:* maker-devel at yandell-lab.org > *Asunto:* Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > > Hello, I am trying to use Maker to annotate genomes from different > individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the > coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? > > > blastp tweaked with parameters to optimize near perfect match > > > 2) What is the meaning of all the data in a line such as the following one > (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 > eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > > > AED = Annotation edit distance describes how closely the prediction > matches the evidence. This is a distance measure and thus 0 is a perfect > match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as > AED with a couple of exceptions. For a protein coding exon to be counted > as overlapping protein evidence the reading frame must be the same in the > coding exon and the protein evidence. Second, when mRNA Seq data is used > as evidence and both ends of an exon are supported with splice site > spanning reads, the middle of that exon is counted as supported as well > even if coverage drops off in the interior of the exon.. For the most part > AED and eAED will always be the same, but eAED tends to work better on many > fringe cases. > > QI values are as follows: > > > 1. 5' UTR Length > 2. Fraction of splice sites confirmed by EST alignment. > 3. Fraction of exons that overlap and EST alignment. > 4. Fraction of exons that overlap EST or protein alignment. > 5. Fraction of splice sites confirmed by an ab initio prediction. > 6. Fraction of exons that overlap an ab intitio prediction. > 7. Number of exons in the transcript. > 8. 3' UTR length. > 9. Length of encoded protein. > > > > 3) If I include snap and augustus to improve protein predictions, I get > several protein.fasta files: augustus_masked.proteins.fasta , > snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and > proteins.fasta > > Which of these files contains the definite set of predicted protein > sequences? > > > The proteins.fasta file is the final set of proteins for all genes that > MAKER created annotations for. > > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Thu May 23 14:17:00 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Thu, 23 May 2013 16:17:00 -0400 Subject: [maker-devel] Advice on params for ciliates Message-ID: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Fri May 24 07:10:15 2013 From: daniel.standage at gmail.com (Daniel Standage) Date: Fri, 24 May 2013 09:10:15 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments Message-ID: Greetings! I have some precomputed transcript and protein alignments that I would like to use with Maker. I have converted them into GFF3 format (see attached examples) and provided them to their corresponding entries (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. Unfortunately, Maker seems to be getting caught up on processing these GFF3 files. I've tried running Maker 2.10 as well as the development version (checked out a few months ago--svn server isn't responding so I can't give a precise revision number), and in both cases Maker hangs while trying to create the GFF3 database. These are the last lines I see in STDERR when * --debug* is set. STATUS: Setting up database for any GFF3 input... Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 587. I can't find any documentation specifying any explicit requirements for the alignment-containing GFF3 input files. Maker output uses the pretty canonical *expressed_sequence_match*, *protein_match*, and *match_part*features for encoding alignments, and I have used this convention with my input (see attached examples). I have also double-checked that my examples are valid GFF3, so my guess is that Maker has additional constraints/expectations for certain fields in the GFF3 files (score column? required attributes?). Is this correct, and if so would you be able to point me toward any related documentation I may have missed? Many thanks. -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prot-example.gff3 Type: application/octet-stream Size: 1079 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trans-example.gff3 Type: application/octet-stream Size: 1305 bytes Desc: not available URL: From guoyunfei1989 at gmail.com Fri May 24 10:15:19 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 24 May 2013 09:15:19 -0700 Subject: [maker-devel] ./FINISHED/FINISHED.gff Message-ID: Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGGTGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTAACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip...) maker 2.27 Thanks, Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 10:22:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 12:22:05 -0400 Subject: [maker-devel] ./FINISHED/FINISHED.gff In-Reply-To: Message-ID: Sometime the master_datastore_index.log gets munged by MPI (processes print at the same time). You can rebuild it by running a single instance of maker whith the -dsindex flag. It only takes about 60 seconds to rebuild. Example: cd maker -dsindex --Carson From: Yunfei Guo Date: Friday, 24 May, 2013 12:15 PM To: Subject: [maker-devel] ./FINISHED/FINISHED.gff Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGG TGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTA ACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip ...) maker 2.27 Thanks, Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 14:06:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 16:06:51 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments In-Reply-To: Message-ID: I'm glad it's working. I think I'll add a check for the '/' characters in the base name as I think having it be a directory will get me in trouble somewhere with hidden bugs. Thanks, Carson From: Daniel Standage Date: Friday, 24 May, 2013 4:00 PM To: Carson Holt Subject: Re: [maker-devel] Using maker with precomputed transcript / protein alignments Oh wow, you are going to LOVE this. I kept on messing around with things to see if I could tease out any patterns, and eventually it hit me. In my working directory, I have an outputs directory, which is intended to contain output directories from various different maker runs. However, since my submission scripts launch Maker from the working directory, I use -base outputs/blahblahblah as the base parameter. So when it tries to create output files using the base name (the SQLite3 db just happens to be the first), it tries to create outputs/blahblahblah/outputs/blahblahblah.db, and of course that internal outputs directory doesn't exist. Every time I've had problems, I've been using a basename with a / character (relative directory path). Every time I haven't had problems, it was because the / wasn't there. Since the base parameter determines the name of the output directory, I assumed I could also use it specify a nested output directory. So it looks like I just need to be more careful that the basenames I use don't contain / characters or any other special UNIX characters. Of course, this could be made explicit in the usage statement, or you could add something like this right after parsing the command line arguments. if($OPT{"out_name"} =~ m/\//) { printf(STDERR "base '%s' invalid: basenames containing relative directory paths cause errors; please provide a simple string instead", $OPT{"out_name"}); exit_maker(0); } Alternatively, you could handle things like I had originally expected: if I provide path/to/mybase as my base parameter, maker would create the path/to/mybase directory initially, but then in the creation of subsequent files it would simply use mybase. I don't imagine this would be too extensive of a change, but I understand Maker has a huge codebase. Anyway, just some suggestions, take them for what they're worth. Thanks for your help! -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University On Fri, May 24, 2013 at 3:29 PM, Carson Holt wrote: > NFS is weird. It's hard to say why it was freezing the first times, and did > not appear to freeze on your very last try. I definitely want to know if it > starts to freeze again, or if stack traces show a consistent point where it > freezes. If it keeps happening, I can try making the database in the local > /tmp and then just copying it to the current working directory once it's > populated to get around any weird NFS issues. But before going through all > the effort to do that, I'd like to know that it's not some other weird bug > related to the perl your using or other modules that are installed. Top > candidates on the list would be modules such as forks, forks::shared, DBI, or > DBD::SQLite. Try reinstalling those > > Thanks, > Carson > > > From: Daniel Standage > Date: Friday, 24 May, 2013 3:19 PM > > To: Carson Holt > Subject: Re: [maker-devel] Using maker with precomputed transcript / protein > alignments > > I admit I killed these last few runs too quickly, I guess I was getting > impatient, especially since waiting hours or days hasn't made a difference > before. Either way, that was sloppy on my part. > > However, I always specify the base parameter, whether or not I'm running > mulitple maker jobs from the same directory. And if I ever restarted a job, I > have always removed the original output directory entirely before > relaunching--precisely to avoid the types of mistakes you mention arising from > residual files. > > -- > Daniel S. Standage > Ph.D. Candidate > Bioinformatics and Computational Biology Program > Department of Genetics, Development, and Cell Biology > Iowa State University > > > On Fri, May 24, 2013 at 3:10 PM, Carson Holt wrote: >> Correct if you use the -base parameter you should get a different output >> directory. And if you have never used that base before, and it still >> freezes, then there is a problem. You do need to give it a little more time >> until killing it, as the stack trace in both cases showed that it was less >> than 25% finished reading the input GFF3 files and even less than that in the >> first case (so give it about 5x as long before giving up). >> >> It might just be that the NFS mount is slow. Or because of how weird the >> error is, other options include reinstalling perl and all modules. The >> weirdest bugs are often broken perl or inadvertently using modules from >> different perl versions via the PERL5LIB environmental variable (this is very >> common and can cause very wacky behavior). Another option is verifying all >> software for the lustre NFS mount is up to date. Lastly there was an odd NFS >> bug that came up on the e-mail list last week that was fixed by a kernel >> upgrade. >> >> --Carson >> >> >> >> From: Daniel Standage >> Date: Friday, 24 May, 2013 3:01 PM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Using maker with precomputed transcript / protein >> alignments >> >> The file locks are created only in the output directory, no? So there is a >> problem if I have multiple maker runs launched from the same directory, but >> writing to different output directories (as specified by different base >> parameters)? >> >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Bioinformatics and Computational Biology Program >> Department of Genetics, Development, and Cell Biology >> Iowa State University >> >> >> On Fri, May 24, 2013 at 2:57 PM, Carson Holt wrote: >>> To clarify, that means you need to use a different working directory. Can >>> be a subdirectory of your original. >>> >>> --Carson >>> >>> >>> From: Carson Holt >>> Date: Friday, 24 May, 2013 2:56 PM >>> To: Daniel Standage >>> >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Both stack traces show different locations in the code and file being read. >>> So it appears it was not frozen, just interrupted by control-C. >>> >>> If you restart make sure you do so in a completely new directory from the >>> original run. This is because I wonder if there is a failed job that still >>> has active processes and is holding onto file locks in that directory. >>> >>> --Carson >>> >>> >>> From: Daniel Standage >>> Date: Friday, 24 May, 2013 2:50 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Deleted output directory and re-ran. Stack trace looks pretty similar. >>> >>> >>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line >>> 607. >>> SIGINT received >>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>> line 97, <$IN >>>> > line 243676. >>> forks::signals::__ANON__('INT') called at /usr/lib64/perl5/DBI.pm >>> line 1590 >>> eval {...} called at /usr/lib64/perl5/DBI.pm line 1590 >>> DBD::_::db::do('DBI::db=HASH(0x4987228)', 'INSERT INTO est_gff >>> (seqid, source, parent, start, end, line)...') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 493 >>> GFFDB::_add_to_db('GFFDB=HASH(0x49727a0)', >>> 'DBI::db=HASH(0x49871e0)', 'est_gff', 'HASH(0x49877e0)') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 432 >>> GFFDB::_add_type('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>> 'est_gff') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>> GFFDB::add_est('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>> GFFDB::new('GFFDB', 'HASH(0x489c488)') called at >>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>> >>> >>> -- >>> Daniel S. Standage >>> Ph.D. Candidate >>> Bioinformatics and Computational Biology Program >>> Department of Genetics, Development, and Cell Biology >>> Iowa State University >>> >>> >>> On Fri, May 24, 2013 at 2:45 PM, Carson Holt wrote: >>>> Could you run again, and so I can see if the stack trace is the same each >>>> time. >>>> >>>> --Carson >>>> >>>> >>>> From: Daniel Standage >>>> Date: Friday, 24 May, 2013 2:39 PM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>> protein alignments >>>> >>>> Restarted in the original NSF-mounted directory, never saw the .db file, >>>> and got this as the stack trace upon termination. >>>> >>>> STATUS: Setting up database for any GFF3 input... >>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>> line 607. >>>> SIGINT received >>>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>>> line 97, <$IN> line 170294. >>>> forks::signals::__ANON__('INT') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> eval {...} called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> GFFDB::_parse_line('GFFDB=HASH(0x4e5c730)', 'SCALAR(0x4e714b8)', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 431 >>>> GFFDB::_add_type('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>>> GFFDB::add_est('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>>> GFFDB::new('GFFDB', 'HASH(0x4d86488)') called at >>>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>>> >>>> >>>> -- >>>> Daniel S. Standage >>>> Ph.D. Candidate >>>> Bioinformatics and Computational Biology Program >>>> Department of Genetics, Development, and Cell Biology >>>> Iowa State University >>>> >>>> >>>> On Fri, May 24, 2013 at 2:25 PM, Carson Holt wrote: >>>>> Start a new job in a new directory from the original job (NFS mount). Use >>>>> the new maker executable I sent. If it still freezes, hit control-C to >>>>> get a stack trace. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> From: Daniel Standage >>>>> Date: Friday, 24 May, 2013 2:21 PM >>>>> >>>>> To: Carson Holt >>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>> protein alignments >>>>> >>>>> The job from several hours ago is still running with no changes. >>>>> >>>>> I just relaunched the job with a locally mounted working directory: I >>>>> could see the .db file almost immediately, and it took less than 5 minutes >>>>> to successfully build the SQLite3 db and proceed to the next steps of the >>>>> pipeline. Any ideas? >>>>> >>>>> -- >>>>> Daniel S. Standage >>>>> Ph.D. Candidate >>>>> Bioinformatics and Computational Biology Program >>>>> Department of Genetics, Development, and Cell Biology >>>>> Iowa State University >>>>> >>>>> >>>>> On Fri, May 24, 2013 at 2:01 PM, Carson Holt wrote: >>>>>> The NFS mount appears to be configured correctly. >>>>>> >>>>>> Here is what the maker.output directory should look like while the >>>>>> database is being generated. >>>>>> >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:51 . >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:50 .. >>>>>> -rw------x 1 cholt staff 85 24 May 13:50 >>>>>> .NFSLock.gi_lock.NFSLock >>>>>> -rw------- 1 cholt staff 52 24 May 13:50 >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> -rw-r--r-- 1 cholt staff 1413 24 May 13:50 maker_bopts.log >>>>>> -rw-r--r-- 1 cholt staff 1666 24 May 13:50 maker_exe.log >>>>>> -rw-r--r-- 1 cholt staff 4610 24 May 13:50 maker_opts.log >>>>>> drwxr-xr-x 4 cholt staff 136 24 May 13:50 mpi_blastdb >>>>>> -rw-r--r-- 1 cholt staff 29326336 24 May 13:51 pdom-annot-v0.41-1.db >>>>>> -rw-r--r-- 1 cholt staff 6704 24 May 13:51 >>>>>> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> >>>>>> Could you watch while maker is running to see if this file is created --> >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> You must use ls with the -a flag to see it or it will be hidden. >>>>>> >>>>>> Just keep letting it run until that file shows up. Shortly after it sows >>>>>> up, this one should appear --> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> Also could you try running MAKER once with the working directory being >>>>>> locally mounted (/tmp for example). >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Daniel Standage >>>>>> Date: Friday, 24 May, 2013 1:36 PM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>> protein alignments >>>>>> >>>>>> Here is the output. >>>>>> >>>>>> [dstandag at mason annot-v0.41] ls -al >>>>>> outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> total 32 >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 13:34 . >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 .. >>>>>> -rw-r--r-- 1 dstandag biol 1413 May 24 12:39 maker_bopts.log >>>>>> -rw-r--r-- 1 dstandag biol 1355 May 24 12:39 maker_exe.log >>>>>> -rw-r--r-- 1 dstandag biol 4883 May 24 12:39 maker_opts.log >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 mpi_blastdb >>>>>> -rw------x 1 dstandag biol 70 May 24 13:34 .NFSLock.gi_lock.NFSLock >>>>>> [dstandag at mason annot-v0.41] df outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>>> dc-mds01.uits.indiana.edu:/dc >>>>>> 1144318908992 928977247792 203869022296 83% /N/dc >>>>>> [dstandag at mason annot-v0.41] mount >>>>>> login_x86_64 on / type tmpfs (rw) >>>>>> proc on /proc type proc (rw) >>>>>> sysfs on /sys type sysfs (rw) >>>>>> devpts on /dev/pts type devpts (rw,gid=5,mode=620) >>>>>> tmpfs on /dev/shm type tmpfs (rw) >>>>>> tmpfs on /var/tmp type tmpfs (rw,size=10m) >>>>>> /dev/sdb2 on /tmp type ext4 (rw,relatime,barrier=1,data=ordered) >>>>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) >>>>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) >>>>>> AFS on /afs type afs (rw) >>>>>> bl-nas1:/vol/hd00 on /N/hd00 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas1:/vol/hd01 on /N/hd01 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/hd02 on /N/hd02 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas2:/vol/hd03 on /N/hd03 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/hdln on /N/u type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/soft on /N/soft type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/logs on /N/logs type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> none on /dev/cpuset type cpuset (rw) >>>>>> dc-mds01.uits.indiana.edu:/dc on /N/dc type lustre (rw,localflock) >>>>>> 149.165.235.173:/mds-wan/client on /N/dcwan type lustre (rw,localflock) >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daniel S. Standage >>>>>> Ph.D. Candidate >>>>>> Bioinformatics and Computational Biology Program >>>>>> Department of Genetics, Development, and Cell Biology >>>>>> Iowa State University >>>>>> >>>>>> >>>>>> On Fri, May 24, 2013 at 1:29 PM, Carson Holt wrote: >>>>>>> They load fine for me. It is an SQLite database. I know that SQLlite >>>>>>> can freeze on NFS if it's not configured properly. >>>>>>> >>>>>>> Could you send me the output from these 3 commands. >>>>>>> >>>>>>> ls -al >>>>>>> df >>>>>>> mount >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 1:13 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I deleted the entire output directory before relaunching. No .db files >>>>>>> are even created, only the mpi_blastdb directory with the genomic >>>>>>> sequence data and corresponding index, before it hangs. >>>>>>> >>>>>>> The GFF3 files are attached. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 12:57 PM, Carson Holt >>>>>>> wrote: >>>>>>> Did you delete any *.db files in the maker.output directory first. If >>>>>>> not do that, and check on the rerun if that file is growing in size. It >>>>>>> is a database to hold the GFF3 file entries. It's final size should be >>>>>>> ~ 2x the size of the combined GFF3 files. If it is growing, then it is >>>>>>> not really frozen (you just need to give it more time). If it is not >>>>>>> growing, send me your GFF3 files and I can try and duplicate the error. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 12:50 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I installed BioPerl-1.6.901, rebuilt Maker, and re-launched the job. >>>>>>> After running for 10-15 minutes, it seems to be hanging in the same >>>>>>> place as before. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 11:38 AM, Carson Holt >>>>>>> wrote: >>>>>>> That is the CPAN version and the last stable release on bioperl.org >>>>>>> . Older version as well as the bio-perl live >>>>>>> version will cause MAKER to fail. The both have issues with the Fasta >>>>>>> indexing module that maker uses. >>>>>>> >>>>>>> http://search.cpan.org/CPAN/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.901.tar >>>>>>> .gz >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 11:34 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I'm not sure if a rebuild of Maker was necessary, but I tried running it >>>>>>> just to be safe. It's complaining about Bio::Root::Version dependency >>>>>>> not being met. Looking at the Build.PL file, it requires >>>>>>> Bio::Root::Version version 1.006901. Is there really such a version, or >>>>>>> should this be changed to 1.006 or 1.006001? >>>>>>> >>>>>>> For now I'll change it to 1.006001 (the installed version) and proceed >>>>>>> with another test. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 9:45 AM, Carson Holt wrote: >>>>>>> Could you run this command in the maker devel base directory. >>>>>>> >>>>>>> svn switch --relocate svn://* >>>>>>> ************ >>>>>>> svn://* *************** >>>>>>> >>>>>>> Then do 'svn update', and then tell me what happens. Make sure to >>>>>>> delete the and *.db files in the *.maker.output/ directory before >>>>>>> retrying. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 9:10 AM >>>>>>> To: Maker Mailing List >>>>>>> Subject: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> Greetings! >>>>>>> >>>>>>> I have some precomputed transcript and protein alignments that I would >>>>>>> like to use with Maker. I have converted them into GFF3 format (see >>>>>>> attached examples) and provided them to their corresponding entries >>>>>>> (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. >>>>>>> >>>>>>> Unfortunately, Maker seems to be getting caught up on processing these >>>>>>> GFF3 files. I've tried running Maker 2.10 as well as the development >>>>>>> version (checked out a few months ago--svn server isn't responding so I >>>>>>> can't give a precise revision number), and in both cases Maker hangs >>>>>>> while trying to create the GFF3 database. These are the last lines I see >>>>>>> in STDERR when --debug is set. >>>>>>> >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>>>>> line 587. >>>>>>> >>>>>>> I can't find any documentation specifying any explicit requirements for >>>>>>> the alignment-containing GFF3 input files. Maker output uses the pretty >>>>>>> canonical expressed_sequence_match, protein_match, and match_part >>>>>>> features for encoding alignments, and I have used this convention with >>>>>>> my input (see attached examples). I have also double-checked that my >>>>>>> examples are valid GFF3, so my guess is that Maker has additional >>>>>>> constraints/expectations for certain fields in the GFF3 files (score >>>>>>> column? required attributes?). Is this correct, and if so would you be >>>>>>> able to point me toward any related documentation I may have missed? >>>>>>> >>>>>>> Many thanks. >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Sun May 26 20:26:58 2013 From: rob.syme at gmail.com (Rob Syme) Date: Mon, 27 May 2013 10:26:58 +0800 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Message-ID: Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_datastore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Tue May 28 06:00:54 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Tue, 28 May 2013 13:00:54 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195ED54.4090501@ebi.ac.uk> References: <5195ED54.4090501@ebi.ac.uk> Message-ID: <51A49C76.3060801@ebi.ac.uk> Thanks Carson, 2.28 with -a command line flag fixed this problem. Uma On 17/05/13 09:41, Uma Maheswari wrote: > Hi Carson, > > I checked with Michael, this is different from what he saw, he had > entire segements of gff files duplicated, In this case, just Parent id > is. > I am preparing the files you asked for, will send them soon > > thanks > Uma > > > On 16/05/13 17:50, Carson Holt wrote: >> Yes. Perhaps this is the same issue Michael saw, although the one >> difference I see from his post is the Parent= attribute. >> >> --> >> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 >> >> I have seen duplicate exons from GFF3 pass-through in the past, but >> if that's not being used I'd be very appreciative of any test dataset >> you could give me. >> >> Thanks, >> Carson >> >> >> >> >> From: Daniel Hughes > >> Date: Thursday, 16 May, 2013 12:38 PM >> To: Carson Holt > >> Cc: Uma Maheswari >, >> "maker-devel at yandell-lab.org " >> > >> Subject: Re: [maker-devel] duplicate exons? >> >> hiya, are you using the same instance as michael at ebi as this >> sounds like the same problem he had last week and he wasn't running >> pass through. i've run 2.27 here 30+ times here and not seen this? is >> something very strange corrupted? >> >> dan. >> >> Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) >> ------------------------------------------------------------------------------------- >> dsth at cantab.net >> dsth at cpan.org >> >> >> 2013/5/16 Carson Holt > >> >> I think this also may be a result of using GFF3 pass-through. So >> if that >> is the case, could you send me any GFF3 files you gave maker in >> addition >> to the other files I asked for. >> >> Thanks, >> Carson >> >> >> >> On 13-05-16 12:08 PM, "Uma Maheswari" > > wrote: >> >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I >> found >> >that few hundreds of genes with 'duplicate exons' . When I looked >> in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue May 28 19:37:58 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 29 May 2013 01:37:58 +0000 Subject: [maker-devel] maker running error Message-ID: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue May 28 19:58:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Wed, 29 May 2013 01:58:51 +0000 Subject: [maker-devel] maker running error In-Reply-To: References: Message-ID: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:45:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:45:30 -0400 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? In-Reply-To: Message-ID: It's not an MPI requirement, just an execution error. I've attached a fixed version of that script. Really it is just a wrapper that runs maker with a few parameters changes. You can do the exact same thing by removing all repeat mask options, setting est2genome=1 and then an adding est_forward=1 to the maker_opts.ctl file. Thanks, Carson From: Rob Syme Date: Sunday, 26 May, 2013 10:26 PM To: Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_dat astore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map2assembly Type: application/octet-stream Size: 6412 bytes Desc: not available URL: From carsonhh at gmail.com Wed May 29 06:49:39 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:49:39 -0400 Subject: [maker-devel] maker running error In-Reply-To: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Message-ID: Yes, most likely an input fasta error. If that is not the case there are also some versions of BLAST that have version specific failures, and are fixed by upgrading blast. For example, I see you are using blastall which is from the older NCBI BLAST as apposed to the newer BLAST+. --Carson From: Mark Yandell Date: Tuesday, 28 May, 2013 9:58 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker running error Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: > Dear all, > > When I try to run maker on my datasets, there is an error like this: > > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > > #--------- command -------------# > Widget::blastn: > /usr/local/bin/blastall -p blastn -d > /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r > 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blas > tn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn > #-------------------------------# > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > ERROR: BLASTN failed > > FATAL ERROR > ERROR: Failed while doing blastn of ESTs!! > > ERROR: Chunk failed at level 8 > !! > FAILED CONTIG:C4345703 > > > Could anyone give me some suggestions? > > Thanks! > > Jingjing > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:54:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:54:30 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Yes. That's ok, but you would get better performance by installing MPI and using that. Alternatively just start maker several times in the same directory without splitting the input fasta. You can usually start about 10-15 concurrent maker processes safely, but would still get better performance with MPI. --Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Tuesday, 28 May, 2013 8:16 AM To: , Carson Holt Cc: Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > > > Thanks, > > Carson > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > > > Thank you! > > > > Best, > > > > Diana > > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> Ok. You just ran the evidence and didn't give a gene predictor. You need >> to provide an HMM file for SNAP a species for augustus, or for rough >> annotations you can set protein3genome=1 and est2genome=1. This will try and >> generate models direct from the alignments. >> >> >> >> If you provide a gene predictor, then MAKER can talk to it about the >> evidence alignments so it can make a best gene call for the region. Then >> there will be gene/mRNA/exon model in the GFF3 file and entires in the >> proteins.fasta and transcripts.fasta. If you need to train a predictor, you >> can train SNAP using the maker2zff script and the SNAP documentation or maker >> GMOD tutorial. If you want to train augustus Jason Stajich wrote an >> excellent explanation as well as tools in a previous list message. >> >> >> >> >> list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html >> >> Script is in this github repo - >> >> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2a >> ugustus_gbk.pl >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 1:41 PM >> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> >> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, >> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de> >> Subject: Re: [maker-devel] Maker consensus >> >> >> >> >> >> Hi Carson, >> >> >> >> Thank you for the quick answer. >> >> I ran gff3_merge to merge all the gff files and this resulted in a gff file, >> which has these type of fields: >> >> scaffold32239 blastx protein_match 22905 34500 174 + . >> ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML >> 1-2039; >> scaffold32239 blastx match_part 22905 23045 174 + . >> ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG0000 >> 0000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT0000 >> 0000219|DSCAML1-2039 172 218;Gap=M47; >> >> In comparison to the dpp_contig test file, I am missing est2genome evidence, >> most probably because my est data set is pretty poor. I have blastx and >> protein2genome evidence though. >> >> >> >> My goal is to extract the genes that could be annotated on the scaffolds. In >> the gff files the hits overlap most of the times, I can visualize this >> properly in apollo: for example one scaffold hits DSCAML gene in both >> zebrafinch and chicken, but extracting the coordinates between which this >> scaffold fits this annotated gene is difficult from the gff. Manually >> curating the genes is also not an option, since I am trying to do this for a >> 1.7Gb genome. >> >> >> >> I hope this explains better what we are after. >> >> >> >> Thank you once again. >> >> >> >> Best regards, >> >> >> >> Diana >> On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: >> >> >>> >>> I'm sorry I don?t' understand question 1. You are you missing resulting >>> fasta files, correct? Did your resulting GFF3 file have any features of >>> type "gene"? Did you run fasta_merge after running gff3_merge? >>> >>> >>> >>> Could you give me more details on what you are trying to do, so I can take >>> a stab at question 2 as well. >>> >>> >>> >>> Thanks, >>> >>> Carson >>> >>> >>> >>> >>> >>> >>> >>> From: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Date: Friday, 10 May, 2013 10:44 AM >>> To: < maker-devel at yandell-lab.org> >>> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >>> kelso at eva.mpg.de>, Torsten Schoeneberg < >>> torsten.schoeneberg at medizin.uni-leipzig.de> >>> Subject: [maker-devel] Maker consensus >>> >>> >>> >>> >>> >>> >>> >>> Dear maker developers, >>> >>> >>> I am a phD student working on de novo assembly and annotation of a bird >>> genome. I used Maker as annotation pipeline, which ran very well, and I >>> obtained different annotations with evidence from Augustus gene predictor, >>> small EST dataset from my organism and protein sequences from chicken, >>> turkey and zebrafinch. I could combine the different gff files from >>> different scaffolds into one gff file with annotations for the entire >>> genome. >>> >>> >>> I now have two questions: >>> >>> >>> 1. What could be the reason that I haven't gotten the protein.fasta and >>> trancript.fasta files >>> >>> >>> 2. How can I obtain a consensus gene list of different evidences from maker? >>> What I would actually need is the scaffold, coordinates and annotation (gene >>> name) according to the 3 other bird species. >>> Thank you in advance. >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Diana Le Duc >>> >>> >>> >>> -- >>> >>> Max Planck Institute for Evolutionary Anthropology >>> Department of Evolutionary Genetics >>> Deutscher Platz 6 >>> D-04103 Leipzig >>> >>> Phone +49 (0)341-3550-554 >>> www.eva.mpg.de >>> >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Tue May 28 06:16:40 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Tue, 28 May 2013 14:16:40 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > Thanks, > Carson > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > Thank you! > > Best, > > Diana > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > Ok. You just ran the evidence and didn't give a gene predictor. You > > > need to provide an HMM file for SNAP a species for augustus, or for > > > rough annotations you can set protein3genome=1 and est2genome=1. This > > > will try and generate models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the > > evidence alignments so it can make a best gene call for the region. Then > > there will be gene/mRNA/exon model in the GFF3 file and entires in the > > proteins.fasta and transcripts.fasta. If you need to train a predictor, you > > can train SNAP using the maker2zff script and the SNAP documentation or > > maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an > > excellent explanation as well as tools in a previous list message. > > > > list msg - > > http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > > > Script is in this github repo - > > > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 1:41 PM > > To: < maker-devel at yandell-lab.org >, > > Carson Holt < carsonhh at gmail.com > > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > > >, Gabriel Renaud < > > gabriel_renaud at eva.mpg.de >, Janet Kelso > > < kelso at eva.mpg.de > > > Subject: Re: [maker-devel] Maker consensus > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff > > file, which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > > scaffold32239 blastx match_part 22905 23045 174 + . > > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > > 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome > > evidence, most probably because my est data set is pretty poor. I have > > blastx and protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. > > In the gff files the hits overlap most of the times, I can visualize this > > properly in apollo: for example one scaffold hits DSCAML gene in both > > zebrafinch and chicken, but extracting the coordinates between which this > > scaffold fits this annotated gene is difficult from the gff. Manually > > curating the genes is also not an option, since I am trying to do this for a > > 1.7Gb genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > > wrote: > > > > > > > I'm sorry I don?t' understand question 1. You are you missing > > > > > resulting fasta files, correct? Did your resulting GFF3 file have > > > > > any features of type "gene"? Did you run fasta_merge after running > > > > > gff3_merge? > > > > > > Could you give me more details on what you are trying to do, so I can > > > take a stab at question 2 as well. > > > > > > Thanks, > > > Carson > > > > > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Date: Friday, 10 May, 2013 10:44 AM > > > To: < maker-devel at yandell-lab.org > > > > > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > > >, Janet Kelso < kelso at eva.mpg.de > > > >, Torsten Schoeneberg < > > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > > > Subject: [maker-devel] Maker consensus > > > > > > > > > Dear maker developers, > > > > > > I am a phD student working on de novo assembly and annotation of a bird > > > genome. I used Maker as annotation pipeline, which ran very well, and I > > > obtained different annotations with evidence from Augustus gene predictor, > > > small EST dataset from my organism and protein sequences from chicken, > > > turkey and zebrafinch. I could combine the different gff files from > > > different scaffolds into one gff file with annotations for the entire > > > genome. > > > > > > I now have two questions: > > > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > > trancript.fasta files > > > > > > 2. How can I obtain a consensus gene list of different evidences from > > > maker? What I would actually need is the scaffold, coordinates and > > > annotation (gene name) according to the 3 other bird species. > > > > > > Thank you in advance. > > > > > > Best regards, > > > > > > Diana Le Duc > > > > > > -- > > > > > > Max Planck Institute for Evolutionary Anthropology > > > Department of Evolutionary Genetics > > > Deutscher Platz 6 > > > D-04103 Leipzig > > > > > > Phone +49 (0)341-3550-554 > > > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing > > > list maker-devel at box290.bluehost.com > > > > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaganjot.kaur at sickkids.ca Wed May 29 13:34:19 2013 From: gaganjot.kaur at sickkids.ca (Gaganjot Kaur) Date: Wed, 29 May 2013 19:34:19 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Message-ID: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 29 21:10:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 30 May 2013 03:10:52 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs In-Reply-To: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Message-ID: This is a parsing error coming from BioPerl. Could you run maker with the --debug flag. Redirect the STDERR to a file. You can kill it after a few seconds, I really just want to see the version for your BioPerl installation. Also what version of BLAST are you running. --Carson From: Gaganjot Kaur > Date: Wednesday, 29 May, 2013 3:34 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Wed May 1 05:38:52 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Wed, 01 May 2013 12:38:52 +0100 Subject: [maker-devel] substr outside of string Message-ID: <5180FECC.2020308@ebi.ac.uk> Hello! I have run maker with est and rna seq data to create a training set for SNAP. Then I trained SNAP and added the hmm to the snaphmm option and reran maker. Maker is giving me error messages like this: " setting up GFF3 output and fasta chunks doing repeat masking re reading repeat masker report. substr outside of string at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 . --> rank=NA, hostname=ebi-209.ebi.ac.uk " The line from which this error message originates is: substr($$seq, $b -1 , $l, "$replace"x$l); After getting these error messages I replaced it with eval { substr($$seq, $b -1 , $l, "$replace"x$l); }; if ($@) { use Carp; use Data::Dumper; confess( $@ . "\n\n" . Dumper($p) . "\n\n" . "Length of sequence: " . (length $$seq) ); } After that I got this: $VAR1 = [ 98926, 99033 ]; Length of sequence: 98686 at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 5 I have not changed the genome file. I'm also concerned with the reported length of 98686, because I have a list of all sequences in the file and their lengths, and none of them has a length of 98686 bp. The sequences with the closest lengths are these: 98367 LSalAtl2s1200 98438 LSalAtl2s1473 98776 LSalAtl2s1613 98876 LSalAtl2s1199 so they are not even close. $$seq is a sequence as a string, when I print it. Sometimes maker prints a message like this: " --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s63 Length: 3997709 Tries: 5!! #--------------------------------------------------------------------- " But according to my list, which I generated from the exact same file that maker has in genome_file option, the length of that sequence is 1169407. Any idea, why I am getting these problems and what to do about them? Cheers, Michael. From carsonhh at gmail.com Wed May 1 07:17:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 01 May 2013 09:17:50 -0400 Subject: [maker-devel] substr outside of string In-Reply-To: <5180FECC.2020308@ebi.ac.uk> Message-ID: The length you are printing is not the length of the contig, but rather the length of the piece of the contig MAKER is working with at that moment. The fact that the length is not exactly 100000 is telling me that this is a piece at the end of the contig. By any chance are you using GFF3 pass-through of repeat elements? If not there may be a repeatmasker parsing bug as the start and end coordinate are off the edge of the contig. If you run maker on the command line (not vie MPI), what is the repeatmasker report read immediately before the error. Could you then attach it and the fasta sequence for the contig that fails. Thanks, Carson On 13-05-01 7:38 AM, "Michael Nuhn" wrote: >Hello! > >I have run maker with est and rna seq data to create a training set for >SNAP. Then I trained SNAP and added the hmm to the snaphmm option and >reran maker. > >Maker is giving me error messages like this: > >" >setting up GFF3 output and fasta chunks >doing repeat masking >re reading repeat masker report. > >substr outside of string at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 >. >--> rank=NA, hostname=ebi-209.ebi.ac.uk >" > >The line from which this error message originates is: > > substr($$seq, $b -1 , $l, "$replace"x$l); > >After getting these error messages I replaced it with > > eval { > substr($$seq, $b -1 , $l, "$replace"x$l); > }; > if ($@) { > use Carp; > use Data::Dumper; > confess( > $@ > . "\n\n" > . Dumper($p) > . "\n\n" > . "Length of sequence: " . (length $$seq) > ); > } > >After that I got this: > >$VAR1 = [ > 98926, > 99033 > ]; > > >Length of sequence: 98686 at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 >5 > >I have not changed the genome file. > >I'm also concerned with the reported length of 98686, because I have a >list of all sequences in the file and their lengths, and none of them >has a length of 98686 bp. The sequences with the closest lengths are >these: > >98367 LSalAtl2s1200 >98438 LSalAtl2s1473 >98776 LSalAtl2s1613 >98876 LSalAtl2s1199 > >so they are not even close. > >$$seq is a sequence as a string, when I print it. > >Sometimes maker prints a message like this: > >" >--Next Contig-- > >Processing run.log file... >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s63 >Length: 3997709 >Tries: 5!! >#--------------------------------------------------------------------- >" > >But according to my list, which I generated from the exact same file >that maker has in genome_file option, the length of that sequence is >1169407. > >Any idea, why I am getting these problems and what to do about them? > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ejr at stowers.org Wed May 1 09:57:11 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 15:57:11 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Wed May 1 17:42:47 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 1 May 2013 17:42:47 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > Should this be accessible anonymously? > > I'm unable to connect. > > Eric > > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM > To: Barry Moore > Cc: Eric Ross , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >> Hi Eric, >> >> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >> >> http://www.sequenceontology.org/resources/sobacl.html >> >> Ultimately you'll get it like this: >> >> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >> >> Then run: >> >> SOBA/bin/SOBAcl --help >> >> For a lot of command line examples have a look in: >> >> SOBA/t/sobacl_test.sh >> >> B >> >> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >> >>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>> gff files? >>> >>> SOBA can give some basic stats, but it doesn't play well with my giant >>> files and I haven't figured out a way to run it locally. >>> >>> For that matter does anyone have a script that will calculate SOBA like >>> stats locally? I'd rather avoid writing one myself if something else is >>> out there. >>> >>> Thanks, >>> >>> Eric >>> >>> -- >>> Eric Ross >>> Bioinformatic Specialist I >>> Alejandro S?nchez Alvarado Laboratory >>> Stowers Institute for Medical Research >>> Howard Hughes Medical Institute >>> ejr at stowers.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Wed May 1 17:53:08 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 23:53:08 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Works now. Thanks much, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM To: Eric Ross > Cc: Jason Stajich >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunfei1989 at gmail.com Fri May 3 10:33:42 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 3 May 2013 09:33:42 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped Message-ID: Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri May 3 16:51:27 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 3 May 2013 16:51:27 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: <37BA6893-1175-4F3E-B3AA-6C1E23C4364E@genetics.utah.edu> Let me know how it works out for you - feedback either positive or negative is useful. B On May 1, 2013, at 5:53 PM, Ross, Eric wrote: > Works now. > > Thanks much, > > Eric > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM > To: Eric Ross > Cc: Jason Stajich , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Eric, > > Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. > > B > > On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > >> Should this be accessible anonymously? >> >> I'm unable to connect. >> >> Eric >> >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> From: Jason Stajich >> Date: Monday, April 29, 2013 5:49 PM >> To: Barry Moore >> Cc: Eric Ross , "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] repeat statistics >> >> Barry - I think you mean topaz instead of malachite? >> >> svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA >> >> >> Jason Stajich >> jason at bioperl.org >> jason.stajich at gmail.com >> http://bioperl.org/wiki/User:Jason >> http://twitter.com/hyphaltip >> >> >> On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >>> Hi Eric, >>> >>> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >>> >>> http://www.sequenceontology.org/resources/sobacl.html >>> >>> Ultimately you'll get it like this: >>> >>> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >>> >>> Then run: >>> >>> SOBA/bin/SOBAcl --help >>> >>> For a lot of command line examples have a look in: >>> >>> SOBA/t/sobacl_test.sh >>> >>> B >>> >>> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >>> >>>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>>> gff files? >>>> >>>> SOBA can give some basic stats, but it doesn't play well with my giant >>>> files and I haven't figured out a way to run it locally. >>>> >>>> For that matter does anyone have a script that will calculate SOBA like >>>> stats locally? I'd rather avoid writing one myself if something else is >>>> out there. >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> -- >>>> Eric Ross >>>> Bioinformatic Specialist I >>>> Alejandro S?nchez Alvarado Laboratory >>>> Stowers Institute for Medical Research >>>> Howard Hughes Medical Institute >>>> ejr at stowers.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmdoyle at purdue.edu Sun May 5 05:55:47 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Sun, 5 May 2013 07:55:47 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1109250054.216072.1367754420354.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I've recently attempted to install MAKER (Mac OS X). ?I installed blast and exonerate using the ./Build blast and ./Build exonerate commands, and I manually installed repeatmasker, snap and augustus (I couldn't get the ./Build commands to work). ?I then attempted to test out maker following the 2012 MAKER tutorial. ?I received the blastx error message pasted below, and there is additional information in the maker log I've attached to this email. ?I was wondering if anyone had any suggestions about debugging, as I'm not quite sure where to begin... Best wishes and thanks, Jackie #--------- command -------------# Widget::formater: /usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in /tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 #-------------------------------# dyld: lazy symbol binding failed: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace dyld: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in Widget::formater FATAL ERROR ERROR: Failed while doing blastx repeats!! ERROR: Chunk failed at level 3 !! FAILED CONTIG:contig-dpp-500-500 Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: Build status.odt Type: application/vnd.oasis.opendocument.text Size: 2740 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.odt Type: application/vnd.oasis.opendocument.text Size: 2772 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.odt Type: application/vnd.oasis.opendocument.text Size: 3479 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.odt Type: application/vnd.oasis.opendocument.text Size: 2821 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker log.odt Type: application/vnd.oasis.opendocument.text Size: 3340 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 6 06:32:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 08:32:52 -0400 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: Message-ID: You would have to send me the captured STDERR. MAKER will print out a number of messages whenever it restarts a contig, and will explain why it deletes any files before restarting. Thanks, Carson From: Yunfei Guo Date: Friday, 3 May, 2013 12:33 PM To: Subject: [maker-devel] maker doesn't pick up where it stopped Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 6 08:02:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 10:02:52 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Most maker development and debugging actually happens on a Mac (OS X 10.7.5). Blast, Augustus, SNAP all install for me just fine with maker 2.27. What errors do you get during installation? Do you by any chance have non-standard libraries via Mac ports for example. Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). I then attempted to test out maker following >the 2012 MAKER tutorial. I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From guoyunfei1989 at gmail.com Mon May 6 09:33:57 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Mon, 6 May 2013 08:33:57 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: References: Message-ID: Hi Carson, I used quitet mode, here's stderr (I only show one 'now starting the contig' msg). When I check maker master log upon restart by 'grep -ic finished master_log', all 'finished' tags were gone. A data structure will be created for you at: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _datastore To access files for individual sequences use the datastore index: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _master_datastore_index.log #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold105 Length: 8761 #--------------------------------------------------------------------- ... MAKER WARNING: The file GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/C8/27/scaffold5690//theVoid.scaffold5690/scaffold5690.0.HumanUCSCProteins%2Efasta.blastx did not finish on the last run and must be erased ... ERROR: Could not open '/home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/A4/F7/scaffold6034//theVoid.scaffold6034/scaffold6034.0.Srub%2Elib.specific.out' ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold6034 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold6034 ... On Mon, May 6, 2013 at 5:32 AM, Carson Holt wrote: > You would have to send me the captured STDERR. MAKER will print out a > number of messages whenever it restarts a contig, and will explain why it > deletes any files before restarting. > > Thanks, > Carson > > > From: Yunfei Guo > Date: Friday, 3 May, 2013 12:33 PM > To: > Subject: [maker-devel] maker doesn't pick up where it stopped > > Dear MAKER community, > > I got a problem that maker doesn't pick up where it stopped last time, > rather, it will discard all previous results. > > command: > echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 > maker version: > 2.26 > mpich version: > 1.5rc3 > in maker_opts: > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > > It never happened before. Any advice? > > Thank you! > > Yunfei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon May 6 20:22:23 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 7 May 2013 02:22:23 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> Message-ID: Repeats can still happen in genes. So an outright block actually causes more errors than it avoids, and a mixed approach of hard and soft masking becomes more appropriate. The masking step stops alignments from seeding in repeat regions, but if alignments seed in non-repeat regions then they can still extend through repeat regions during polishing steps (I.e. The EST evidence supports extension through the repeat and inclusion of the TE). --Carson From: Dario Copetti > Organization: AGI Date: Monday, 6 May, 2013 5:19 PM To: > Cc: "kapeel at cals.arizona.edu" >, "Stein, Joshua" >, Rod Wing > Subject: gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcopetti at cals.arizona.edu Mon May 6 15:19:42 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Mon, 06 May 2013 14:19:42 -0700 Subject: [maker-devel] gene models overlapping with TEs Message-ID: <51881E6E.9010202@cals.arizona.edu> Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gene_TE.jpg Type: image/jpeg Size: 177299 bytes Desc: not available URL: From myandell at genetics.utah.edu Mon May 6 21:47:49 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:47:49 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02CEE@mxb2.hg.genetics.utah.edu> could the TEs be in the UTRs? Also, maybe some of these are low complexity regions? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From myandell at genetics.utah.edu Mon May 6 21:49:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:49:51 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> humm, eballing then it doesn't look lie its the UTRss.. Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From carsonhh at gmail.com Tue May 7 05:39:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 07:39:17 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> Message-ID: If I had to guess. I imagine the EST evidence includes assembled mRNA-seq reads? Is that correct? --Carson On 13-05-06 11:49 PM, "Mark Yandell" wrote: >humm, eballing then it doesn't look lie its the UTRss.. > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: maker-devel-bounces at yandell-lab.org >[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >[dcopetti at cals.arizona.edu] >Sent: Monday, May 06, 2013 3:19 PM >To: maker-devel at yandell-lab.org >Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >Subject: [maker-devel] gene models overlapping with TEs > >Carson, > >Analyzing the output of a MAKER run on a rice-sized genome I noticed that >some gene models (~10%) overlap with TE coding regions. As a QC step, I >used BEDtools to determine the intersection of "CDS" and "repeatmasker" >or "repeatrunner" and some 2400 genes overlap for at least 30% of their >respective length. I am wondering how the gene models still appear in the >final output, since I thought that the masking step was giving us the >absoulte confirmation that in our endogenous gene list we do not include >TE coding regions. Here below an example of a gene (attached picture too): > >ObracChr10 maker mRNA 355,056 358,075 . - . >ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >ObracChr10 maker exon 355,056 356,874 . - . >ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >ObracChr10 maker exon 356,965 357,081 . - . >ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,209 357,319 . - . >ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,756 358,075 . - . >ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,756 358,075 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,209 357,319 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 356,965 357,081 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 355,056 356,874 . - 0 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 > > > > > > > > > > > > > > > > > > > > >ObracChr10 repeatrunner match_part 357,755 358,084 566 - > . >ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner protein_match 357,755 358,084 566 - > . >ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner match_part 357,202 357,294 142 - > . >ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner protein_match 357,202 357,294 142 - > . >ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner match_part 355,059 357,092 3367 - > . >ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - > . >ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 > > >This result is valid both for output lines from repeatmasker or >repeatrunner, and the gene models come from either FGENESH or SNAP >predictions. >How can I explain this problem? >Thanks, > >Dario > > > > > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jmdoyle at purdue.edu Tue May 7 09:12:38 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 11:12:38 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, Thanks for the quick reply! ?I don't remember any errors during Blast installation, it appeared to install fine with the ./Build command. ?Augustus, Repeatmasker and SNAP were the programs I could not install with the ./Build commands, and instead installed manually. ?I've attached the error messages I received when I tried to use the ./Build commands. ?I've tested out the three programs I installed manually and they seem to work fine on their own. I do have xcode installed. ?How would I determine if I have "non-standard libraries via Mac ports"? Thanks again for your help with this. Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: "Carson Holt" To: "Jacqueline R M Doyle" , maker-devel at yandell-lab.org Sent: Monday, May 6, 2013 10:02:52 AM Subject: Re: [maker-devel] MAKER installation debugging Most maker development and debugging actually happens on a Mac (OS X 10.7.5). ?Blast, Augustus, SNAP all install for me just fine with maker 2.27. ?What errors do you get during installation? ?Do you by any chance have non-standard libraries via Mac ports for example. ?Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). ?I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). ?I then attempted to test out maker following >the 2012 MAKER tutorial. ?I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. ?I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: repeatmasker installation error.rtf Type: application/rtf Size: 1264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: snap installation error.rtf Type: application/rtf Size: 1095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: agustus installation error.rtf Type: application/rtf Size: 1124 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 7 09:19:57 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 11:19:57 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From jmdoyle at purdue.edu Tue May 7 10:54:22 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 12:54:22 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <963584633.220449.1367945662870.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, I am using MAKER 2.10 (I downloaded it so long ago I'd forgotten there were two options). Would it be better for me to start over with 2.27? Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: Carson Holt To: Jacqueline R M Doyle Cc: maker-devel at yandell-lab.org Sent: Tue, 07 May 2013 11:19:57 -0400 (EDT) Subject: Re: [maker-devel] MAKER installation debugging Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue May 7 12:20:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 14:20:19 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51892ABA.2060100@cals.arizona.edu> Message-ID: This is really more of an evidence issue. Because you have assembled mRNAseq evidence, you are probably getting them improperly included in the assembled EST, so MAKER just follows the evidence. It tries to mask it out, but the alignment of the longer EST heavily supports the repeats inclusion in the model during alignment polishing. Solutions: 1. You can set softmask=0 instead of softmask=1 (1 is the default), to make everything hard masked instead (it will be a hard 'N' so no alignment can happen). 2. You can pre-mask the genome. Easiest way to do this would be to collect the query.masked.fasta files inside each theVoid directory in the datastore and use them as the input. Then none of the polishing steps can ever extend the alignment. 3. You can filter the mRNA-seq data fro TE elements before assembly. Thanks, Carson On 13-05-07 12:24 PM, "Dario Copetti" wrote: >Yes, there was RNA-seq evidence as well. Still I would like to have this >evidence annotated as TE, and not as a gene (or at least to have it >tagged in some way). > >As you suggested, a good solution could be to sequentially soft mask >with the RMasker output and then hard mask with the RRunner result. In >this way we hide TE coding regions from all predictors and alignments, >leaving all the other types of repeats softmasked. This meets Mark's >target of having MITEs and other non-autonomous TEs (as well as >simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my >opinion, this case could be one of the few cases (or the only one?) >where gene and repeat annotation can overlap. > >For our genomes I will have a list of these genes overlapping TE coding >regions, and we will likely remove them. Please let us know how you >intend to fix this problem and on which MAKER version it will appear. >Thanks for the assistance and suggestions, > >Dario > > > >On 05/07/2013 04:39 AM, Carson Holt wrote: >> If I had to guess. I imagine the EST evidence includes assembled >>mRNA-seq >> reads? Is that correct? >> >> --Carson >> >> >> >> On 13-05-06 11:49 PM, "Mark Yandell" wrote: >> >>> humm, eballing then it doesn't look lie its the UTRss.. >>> >>> Mark Yandell >>> Professor of Human Genetics >>> H.A. & Edna Benning Presidential Endowed Chair >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ph:801-587-7707 >>> >>> ________________________________________ >>> From: maker-devel-bounces at yandell-lab.org >>> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >>> [dcopetti at cals.arizona.edu] >>> Sent: Monday, May 06, 2013 3:19 PM >>> To: maker-devel at yandell-lab.org >>> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >>> Subject: [maker-devel] gene models overlapping with TEs >>> >>> Carson, >>> >>> Analyzing the output of a MAKER run on a rice-sized genome I noticed >>>that >>> some gene models (~10%) overlap with TE coding regions. As a QC step, I >>> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >>> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >>> respective length. I am wondering how the gene models still appear in >>>the >>> final output, since I thought that the masking step was giving us the >>> absoulte confirmation that in our endogenous gene list we do not >>>include >>> TE coding regions. Here below an example of a gene (attached picture >>>too): >>> >>> ObracChr10 maker mRNA 355,056 358,075 . - . >>> >>>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_ >>>eA >>> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >>> ObracChr10 maker exon 355,056 356,874 . - . >>> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 356,965 357,081 . - . >>> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,209 357,319 . - . >>> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,756 358,075 . - . >>> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,756 358,075 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,209 357,319 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 356,965 357,081 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 355,056 356,874 . - 0 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ObracChr10 repeatrunner match_part 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner protein_match 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner match_part 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner protein_match 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner match_part 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> >>> >>> This result is valid both for output lines from repeatmasker or >>> repeatrunner, and the gene models come from either FGENESH or SNAP >>> predictions. >>> How can I explain this problem? >>> Thanks, >>> >>> Dario >>> >>> >>> >>> >>> >>> -- >>> Dario Copetti, PhD >>> Research Associate >>> Arizona Genomics Institute >>> University of Arizona - BIO5 >>> >>> 1657 E. Helen St. >>> Tucson, AZ 85721 >>> www.genome.arizona.edu >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > From dcopetti at cals.arizona.edu Tue May 7 10:24:26 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Tue, 07 May 2013 09:24:26 -0700 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: References: Message-ID: <51892ABA.2060100@cals.arizona.edu> Yes, there was RNA-seq evidence as well. Still I would like to have this evidence annotated as TE, and not as a gene (or at least to have it tagged in some way). As you suggested, a good solution could be to sequentially soft mask with the RMasker output and then hard mask with the RRunner result. In this way we hide TE coding regions from all predictors and alignments, leaving all the other types of repeats softmasked. This meets Mark's target of having MITEs and other non-autonomous TEs (as well as simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my opinion, this case could be one of the few cases (or the only one?) where gene and repeat annotation can overlap. For our genomes I will have a list of these genes overlapping TE coding regions, and we will likely remove them. Please let us know how you intend to fix this problem and on which MAKER version it will appear. Thanks for the assistance and suggestions, Dario On 05/07/2013 04:39 AM, Carson Holt wrote: > If I had to guess. I imagine the EST evidence includes assembled mRNA-seq > reads? Is that correct? > > --Carson > > > > On 13-05-06 11:49 PM, "Mark Yandell" wrote: > >> humm, eballing then it doesn't look lie its the UTRss.. >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel-bounces at yandell-lab.org >> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >> [dcopetti at cals.arizona.edu] >> Sent: Monday, May 06, 2013 3:19 PM >> To: maker-devel at yandell-lab.org >> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >> Subject: [maker-devel] gene models overlapping with TEs >> >> Carson, >> >> Analyzing the output of a MAKER run on a rice-sized genome I noticed that >> some gene models (~10%) overlap with TE coding regions. As a QC step, I >> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >> respective length. I am wondering how the gene models still appear in the >> final output, since I thought that the masking step was giving us the >> absoulte confirmation that in our endogenous gene list we do not include >> TE coding regions. Here below an example of a gene (attached picture too): >> >> ObracChr10 maker mRNA 355,056 358,075 . - . >> ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >> ObracChr10 maker exon 355,056 356,874 . - . >> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 356,965 357,081 . - . >> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,209 357,319 . - . >> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,756 358,075 . - . >> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,756 358,075 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,209 357,319 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 356,965 357,081 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 355,056 356,874 . - 0 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ObracChr10 repeatrunner match_part 357,755 358,084 566 - >> . >> ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner protein_match 357,755 358,084 566 - >> . >> ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner match_part 357,202 357,294 142 - >> . >> ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner protein_match 357,202 357,294 142 - >> . >> ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner match_part 355,059 357,092 3367 - >> . >> ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - >> . >> ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> >> >> This result is valid both for output lines from repeatmasker or >> repeatrunner, and the gene models come from either FGENESH or SNAP >> predictions. >> How can I explain this problem? >> Thanks, >> >> Dario >> >> >> >> >> >> -- >> Dario Copetti, PhD >> Research Associate >> Arizona Genomics Institute >> University of Arizona - BIO5 >> >> 1657 E. Helen St. >> Tucson, AZ 85721 >> www.genome.arizona.edu >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From jmdoyle at purdue.edu Tue May 7 22:09:48 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Wed, 8 May 2013 00:09:48 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> I downloaded MAKER 2.27 and it installed perfectly! I worked through the tutorial without any problems. Thanks for your help with this! Best wishes, Jackie From carsonhh at gmail.com Tue May 7 22:10:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 08 May 2013 00:10:33 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: I'm glad it worked. --Carson On 13-05-08 12:09 AM, "Jacqueline R M Doyle" wrote: >I downloaded MAKER 2.27 and it installed perfectly! I worked through the >tutorial without any problems. Thanks for your help with this! > >Best wishes, Jackie > From Carson.Holt at oicr.on.ca Wed May 8 13:25:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 8 May 2013 19:25:52 +0000 Subject: [maker-devel] Non-standard genetic code In-Reply-To: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Message-ID: It's not possible yet. It is one of the things we have on a list to do. It's not a small task either to make the necessary changes to the code, as the codon usage affects blastx alignments, exonerate protein2genome alignments, ab initio gene prediction, gene extension/boundary polishing, and UTR addition. So the changes have to go into many many locations. Thanks, Carson On 13-05-08 11:44 AM, "Daniil Alexeyevsky" wrote: >Hi, > >I want to use MAKER to annotate organism with non-standard genetic >code. (It only has UGA stop-codon, UAA and UAG code glutamine). > >Is it possible to use MAKER in this case? If I am bound to editing some >source codes, could you please point me where to look? > >With best regards, >-- Daniil > From mnuhn at ebi.ac.uk Fri May 10 06:10:35 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 13:10:35 +0100 Subject: [maker-devel] Duplicated exons Message-ID: <518CE3BB.3060003@ebi.ac.uk> Hello Carson! I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 and then about four hundred lines later there are these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 which are identical except for the order number after "exon:". This seems to have happened to a lot of features in that file. How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? Cheers, Michael. From daa at fbb.msu.ru Wed May 8 09:44:50 2013 From: daa at fbb.msu.ru (Daniil Alexeyevsky) Date: Wed, 08 May 2013 19:44:50 +0400 Subject: [maker-devel] Non-standard genetic code Message-ID: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Hi, I want to use MAKER to annotate organism with non-standard genetic code. (It only has UGA stop-codon, UAA and UAG code glutamine). Is it possible to use MAKER in this case? If I am bound to editing some source codes, could you please point me where to look? With best regards, -- Daniil From diana_leduc at eva.mpg.de Fri May 10 08:44:50 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 16:44:50 +0200 (CEST) Subject: [maker-devel] Maker consensus Message-ID: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:13:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:13:33 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: I'm sorry I don?t' understand question 1. You are you missing resulting fasta files, correct? Did your resulting GFF3 file have any features of type "gene"? Did you run fasta_merge after running gff3_merge? Could you give me more details on what you are trying to do, so I can take a stab at question 2 as well. Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 10:44 AM To: Cc: Gabriel Renaud , Janet Kelso , Torsten Schoeneberg Subject: [maker-devel] Maker consensus Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:25:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:25:17 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518CE3BB.3060003@ebi.ac.uk> Message-ID: Very odd. Which version of MAEKR are you using. Are you using GFF3 passthrough in the run that generates the duplication? Thanks, Carson On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >Hello Carson! > >I have been trying to get to the bottom of an error message when >(re)training snap. Snap, or more precisely fathom, was giving me unclear >error messages about misordered and overlapping exons. > >I have looked into the gff files from which these exons originate and >noticed that a lot of exons in that file were duplicated. For example I >have found these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >and then about four hundred lines later there are these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >which are identical except for the order number after "exon:". > >This seems to have happened to a lot of features in that file. > >How can I avoid this? Or if this is just a rare problem, can I have >maker recompute the gff file without redoing all the computations again? > >Cheers, >Michael. > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.zohren at qmul.ac.uk Fri May 10 11:07:30 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Fri, 10 May 2013 18:07:30 +0100 Subject: [maker-devel] annotation of birch genome Message-ID: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using "mpiexec -n 20 maker") I've also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It's been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Fri May 10 11:35:37 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 18:35:37 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D2FE9.6080900@ebi.ac.uk> On 05/10/2013 05:25 PM, Carson Holt wrote: > Very odd. Which version of MAEKR are you using. Are you using GFF3 > passthrough in the run that generates the duplication? I am using version 2.27 of maker. I am not using the passthrough option. Cheers, Michael. > Thanks, > Carson > > > On 13-05-10 8:10 AM, "Michael Nuhn" wrote: > >> Hello Carson! >> >> I have been trying to get to the bottom of an error message when >> (re)training snap. Snap, or more precisely fathom, was giving me unclear >> error messages about misordered and overlapping exons. >> >> I have looked into the gff files from which these exons originate and >> noticed that a lot of exons in that file were duplicated. For example I >> have found these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> and then about four hundred lines later there are these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> which are identical except for the order number after "exon:". >> >> This seems to have happened to a lot of features in that file. >> >> How can I avoid this? Or if this is just a rare problem, can I have >> maker recompute the gff file without redoing all the computations again? >> >> Cheers, >> Michael. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri May 10 11:25:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:25:15 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Message-ID: Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:44:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:44:01 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: Message-ID: Also, if you will be annotating more genomes, you should look into getting allocation on your university's cluster. Queen Mary University has a 2000 cpu cluster. Most cluster managers bend over backwards to help Biologists use their systems as it looks good on progress reports and funding requests as they can show they have a broader user base (i.e. departments other than physics :-) --Carson From: Carson Holt Date: Friday, 10 May, 2013 1:25 PM To: Jasmin Zohren , Subject: Re: [maker-devel] annotation of birch genome Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 11:41:55 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 19:41:55 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > >, Janet Kelso < kelso at eva.mpg.de > >, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de > > > Subject: [maker-devel] Maker consensus > > > Dear maker developers, > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > I now have two questions: > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > > Thank you in advance. > > Best regards, > > Diana Le Duc > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:51:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:51:48 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Ok. You just ran the evidence and didn't give a gene predictor. You need to provide an HMM file for SNAP a species for augustus, or for rough annotations you can set protein3genome=1 and est2genome=1. This will try and generate models direct from the alignments. If you provide a gene predictor, then MAKER can talk to it about the evidence alignments so it can make a best gene call for the region. Then there will be gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and transcripts.fasta. If you need to train a predictor, you can train SNAP using the maker2zff script and the SNAP documentation or maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an excellent explanation as well as tools in a previous list message. list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html Script is in this github repo - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 1:41 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAM L1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG000 00000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00 000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org> > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < > kelso at eva.mpg.de>, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de> > Subject: [maker-devel] Maker consensus > > > > > > > > Dear maker developers, > > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > > I now have two questions: > > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 12:08:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:08:35 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D2FE9.6080900@ebi.ac.uk> Message-ID: 2.27 from the website download or the SVN devel version? Thanks, Carson On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >On 05/10/2013 05:25 PM, Carson Holt wrote: >> Very odd. Which version of MAEKR are you using. Are you using GFF3 >> passthrough in the run that generates the duplication? > >I am using version 2.27 of maker. I am not using the passthrough option. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >> >>> Hello Carson! >>> >>> I have been trying to get to the bottom of an error message when >>> (re)training snap. Snap, or more precisely fathom, was giving me >>>unclear >>> error messages about misordered and overlapping exons. >>> >>> I have looked into the gff files from which these exons originate and >>> noticed that a lot of exons in that file were duplicated. For example I >>> have found these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> and then about four hundred lines later there are these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> which are identical except for the order number after "exon:". >>> >>> This seems to have happened to a lot of features in that file. >>> >>> How can I avoid this? Or if this is just a rare problem, can I have >>> maker recompute the gff file without redoing all the computations >>>again? >>> >>> Cheers, >>> Michael. >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Fri May 10 12:29:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:29:32 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: You can use any species augustus already has. If it doesn't then you train it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH environmental variable, and is usually ?/augusts/config/species Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 2:16 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > > > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2au > gustus_gbk.pl > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1 > -2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000 > 000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT000000 > 00219|DSCAML1-2039 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> I'm sorry I don?t' understand question 1. You are you missing resulting >> fasta files, correct? Did your resulting GFF3 file have any features of type >> "gene"? Did you run fasta_merge after running gff3_merge? >> >> >> >> Could you give me more details on what you are trying to do, so I can take a >> stab at question 2 as well. >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 10:44 AM >> To: < maker-devel at yandell-lab.org> >> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de>, Torsten Schoeneberg < >> torsten.schoeneberg at medizin.uni-leipzig.de> >> Subject: [maker-devel] Maker consensus >> >> >> >> >> >> >> >> Dear maker developers, >> >> >> I am a phD student working on de novo assembly and annotation of a bird >> genome. I used Maker as annotation pipeline, which ran very well, and I >> obtained different annotations with evidence from Augustus gene predictor, >> small EST dataset from my organism and protein sequences from chicken, turkey >> and zebrafinch. I could combine the different gff files from different >> scaffolds into one gff file with annotations for the entire genome. >> >> >> I now have two questions: >> >> >> 1. What could be the reason that I haven't gotten the protein.fasta and >> trancript.fasta files >> >> >> 2. How can I obtain a consensus gene list of different evidences from maker? >> What I would actually need is the scaffold, coordinates and annotation (gene >> name) according to the 3 other bird species. >> Thank you in advance. >> >> >> >> Best regards, >> >> >> >> Diana Le Duc >> >> >> >> -- >> >> Max Planck Institute for Evolutionary Anthropology >> Department of Evolutionary Genetics >> Deutscher Platz 6 >> D-04103 Leipzig >> >> Phone +49 (0)341-3550-554 >> www.eva.mpg.de >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Fri May 10 18:29:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Sat, 11 May 2013 01:29:10 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D90D6.4080603@ebi.ac.uk> On 05/10/2013 07:08 PM, Carson Holt wrote: > 2.27 from the website download or the SVN devel version? SVN. I checked it out on 19/03/2013. > Thanks, > Carson > > > On 13-05-10 1:35 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 05:25 PM, Carson Holt wrote: >>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>> passthrough in the run that generates the duplication? >> >> I am using version 2.27 of maker. I am not using the passthrough option. >> >> Cheers, >> Michael. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>> >>>> Hello Carson! >>>> >>>> I have been trying to get to the bottom of an error message when >>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>> unclear >>>> error messages about misordered and overlapping exons. >>>> >>>> I have looked into the gff files from which these exons originate and >>>> noticed that a lot of exons in that file were duplicated. For example I >>>> have found these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> and then about four hundred lines later there are these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> which are identical except for the order number after "exon:". >>>> >>>> This seems to have happened to a lot of features in that file. >>>> >>>> How can I avoid this? Or if this is just a rare problem, can I have >>>> maker recompute the gff file without redoing all the computations >>>> again? >>>> >>>> Cheers, >>>> Michael. >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From dsth at ebi.ac.uk Fri May 10 18:20:42 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Sat, 11 May 2013 01:20:42 +0100 Subject: [maker-devel] Duplicated exons Message-ID: That is odd. i've run that version of maker 30-40x at ebi lately and never seen it. Is it just one scaffold? While i'd be surprised if it's the cause but have you been playing with the file locking options Carson mentioned a while back? I'd definitely be inclined to re-process it if it's just the one scaffold. Dan On May 10, 2013 12:45 PM, "Michael Nuhn" wrote: > > Hello Carson! > > I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. > > I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > and then about four hundred lines later there are these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > which are identical except for the order number after "exon:". > > This seems to have happened to a lot of features in that file. > > How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? > > Cheers, > Michael. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 12:16:34 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 20:16:34 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > Thank you for the quick answer. > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > 172 218;Gap=M47; > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > I hope this explains better what we are after. > > Thank you once again. > > Best regards, > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > I'm sorry I don?t' understand question 1. You are you missing > > > resulting fasta files, correct? Did your resulting GFF3 file have any > > > features of type "gene"? Did you run fasta_merge after running > > > gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take > > a stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 10:44 AM > > To: < maker-devel at yandell-lab.org > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > >, Janet Kelso < kelso at eva.mpg.de > > >, Torsten Schoeneberg < > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > Subject: [maker-devel] Maker consensus > > > > > > Dear maker developers, > > > > I am a phD student working on de novo assembly and annotation of a bird > > genome. I used Maker as annotation pipeline, which ran very well, and I > > obtained different annotations with evidence from Augustus gene predictor, > > small EST dataset from my organism and protein sequences from chicken, > > turkey and zebrafinch. I could combine the different gff files from > > different scaffolds into one gff file with annotations for the entire > > genome. > > > > I now have two questions: > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > trancript.fasta files > > > > 2. How can I obtain a consensus gene list of different evidences from > > maker? What I would actually need is the scaffold, coordinates and > > annotation (gene name) according to the 3 other bird species. > > > > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > > > Max Planck Institute for Evolutionary Anthropology > > Department of Evolutionary Genetics > > Deutscher Platz 6 > > D-04103 Leipzig > > > > Phone +49 (0)341-3550-554 > > www.eva.mpg.de > > _______________________________________________ maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From linzkl007 at hotmail.com Sat May 11 11:28:47 2013 From: linzkl007 at hotmail.com (=?gb2312?B?7OTs5A==?=) Date: Sun, 12 May 2013 01:28:47 +0800 Subject: [maker-devel] about predictor training Message-ID: Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sun May 12 21:53:34 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Mon, 13 May 2013 12:53:34 +0900 Subject: [maker-devel] exon numbering bug? Message-ID: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 12 22:01:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 00:01:41 -0400 Subject: [maker-devel] exon numbering bug? In-Reply-To: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Message-ID: There has been some post processing of the GFF3. It is not an original MAKER result file. I can tell based on the ID's (maker doesn't assign numerical IDs). Most likely it was processed to make exons unique without having dual parentage. Normally if the same exon is found in two transcripts it will have two parents separated by a comma. I imaging that the post processing script duplicated the exon, creating independent IDs and split the parents, but left the Name= tag the same. Since the Name= tag was based off of the first transcript the exon belonged to, it stayed the same. --Carson From: "Kang, Yang Jae" Date: Sunday, 12 May, 2013 11:53 PM To: Subject: [maker-devel] exon numbering bug? Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:00:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:00:01 -0400 Subject: [maker-devel] about predictor training In-Reply-To: Message-ID: You need to convert the GTF files to GFF3. There is a tophat2gff and cufflinks2gff script that come with MAKER. I recommend only using cufflinks results and ignoring tophat results though as they tend to be a lot more spurious. Jason Stajich wrote an excellent explanation on training Augustus on the list previously - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included scripts to assist with the training - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Overall the strategy is similar to the one used to train SNAP. Thanks, Carson From: ?? Date: Saturday, 11 May, 2013 1:28 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] about predictor training Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:01:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:01:58 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D90D6.4080603@ebi.ac.uk> Message-ID: Could you send me your maker opts files, the contig that fails, and the evidence files you use for that contig. Thanks, Carson On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >On 05/10/2013 07:08 PM, Carson Holt wrote: >> 2.27 from the website download or the SVN devel version? > >SVN. I checked it out on 19/03/2013. > >> Thanks, >> Carson >> >> >> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>> passthrough in the run that generates the duplication? >>> >>> I am using version 2.27 of maker. I am not using the passthrough >>>option. >>> >>> Cheers, >>> Michael. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>> >>>>> Hello Carson! >>>>> >>>>> I have been trying to get to the bottom of an error message when >>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>> unclear >>>>> error messages about misordered and overlapping exons. >>>>> >>>>> I have looked into the gff files from which these exons originate and >>>>> noticed that a lot of exons in that file were duplicated. For >>>>>example I >>>>> have found these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> and then about four hundred lines later there are these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> which are identical except for the order number after "exon:". >>>>> >>>>> This seems to have happened to a lot of features in that file. >>>>> >>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>> maker recompute the gff file without redoing all the computations >>>>> again? >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From mnuhn at ebi.ac.uk Mon May 13 10:30:36 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Mon, 13 May 2013 17:30:36 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <5191152C.5030103@ebi.ac.uk> Hello Carson! On 05/13/2013 03:01 PM, Carson Holt wrote: > Could you send me your maker opts files, the contig that fails, and the > evidence files you use for that contig. Thanks for offering your help. I worked around the problem this morning by removing all exons from the training set for which I was getting the error. Now I'm rerunning maker and I can't find any gff files at the moment with this problem. If the problem reappears, I'll send you the files. Cheers, Michael. > Thanks, > Carson > > > > On 13-05-10 8:29 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 07:08 PM, Carson Holt wrote: >>> 2.27 from the website download or the SVN devel version? >> >> SVN. I checked it out on 19/03/2013. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>> >>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>> passthrough in the run that generates the duplication? >>>> >>>> I am using version 2.27 of maker. I am not using the passthrough >>>> option. >>>> >>>> Cheers, >>>> Michael. >>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>> >>>>>> Hello Carson! >>>>>> >>>>>> I have been trying to get to the bottom of an error message when >>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>> unclear >>>>>> error messages about misordered and overlapping exons. >>>>>> >>>>>> I have looked into the gff files from which these exons originate and >>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>> example I >>>>>> have found these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> and then about four hundred lines later there are these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> which are identical except for the order number after "exon:". >>>>>> >>>>>> This seems to have happened to a lot of features in that file. >>>>>> >>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>> maker recompute the gff file without redoing all the computations >>>>>> again? >>>>>> >>>>>> Cheers, >>>>>> Michael. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>> g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon May 13 10:07:13 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 12:07:13 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <5191152C.5030103@ebi.ac.uk> Message-ID: Ok. Thanks, Carson On 13-05-13 12:30 PM, "Michael Nuhn" wrote: >Hello Carson! > >On 05/13/2013 03:01 PM, Carson Holt wrote: >> Could you send me your maker opts files, the contig that fails, and the >> evidence files you use for that contig. > >Thanks for offering your help. > >I worked around the problem this morning by removing all exons from the >training set for which I was getting the error. Now I'm rerunning maker >and I can't find any gff files at the moment with this problem. > >If the problem reappears, I'll send you the files. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> >> On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 07:08 PM, Carson Holt wrote: >>>> 2.27 from the website download or the SVN devel version? >>> >>> SVN. I checked it out on 19/03/2013. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>>> >>>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>>> passthrough in the run that generates the duplication? >>>>> >>>>> I am using version 2.27 of maker. I am not using the passthrough >>>>> option. >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>>> >>>>>>> Hello Carson! >>>>>>> >>>>>>> I have been trying to get to the bottom of an error message when >>>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>>> unclear >>>>>>> error messages about misordered and overlapping exons. >>>>>>> >>>>>>> I have looked into the gff files from which these exons originate >>>>>>>and >>>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>>> example I >>>>>>> have found these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> and then about four hundred lines later there are these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> which are identical except for the order number after "exon:". >>>>>>> >>>>>>> This seems to have happened to a lot of features in that file. >>>>>>> >>>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>>> maker recompute the gff file without redoing all the computations >>>>>>> again? >>>>>>> >>>>>>> Cheers, >>>>>>> Michael. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>or >>>>>>> g >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From rob.syme at gmail.com Tue May 14 00:54:18 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 14 May 2013 14:54:18 +0800 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Message-ID: Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz, making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 14 05:20:00 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 May 2013 07:20:00 -0400 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy In-Reply-To: Message-ID: You have to use MPICH2, the new MPICH3 is not compatible. MPI version 3 is a completely new protocol implemented in MPICH3, and it breaks MAKER. You can also use OpenMPI with the MAKER version 2.27. Thanks, Carson From: Rob Syme Date: Tuesday, 14 May, 2013 2:54 AM To: Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz , making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Tue May 14 14:42:33 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 14 May 2013 20:42:33 +0000 Subject: [maker-devel] MPI MAKER hanging NFS Message-ID: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1 with RHEL 6.3. Questions: (1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues. (2) Has anyone else seen this? (3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected? I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks. Thanks, Todd Heywood CSHL From Carson.Holt at oicr.on.ca Tue May 14 19:01:00 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 01:01:00 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > From eernst at cshl.edu Wed May 15 11:08:08 2013 From: eernst at cshl.edu (Evan Ernst) Date: Wed, 15 May 2013 13:08:08 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt wrote: > No it does not use ROMIO. > > The locking may be do to how your NFS is implemented. MAKER does a lot of > small writes. Some NFS implementations do not handle that well and only > like large infrequent writes and frequent reads? > MAKER also uses a variant of the File:::NFSLock module which uses > hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > enabled (described here > http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > I know that the FhGFS implementation of NFS has broken hard link > functionality. > > > Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > mounted location. It must be local (/tmp for example). This is because > certain types of operations are not always NFS safe and need a local > location to work with (anything involving berkley DB or SQLite for > example). Make sure you are not setting that to an NFS mounted scratch > location. The mpi**** files, are examples of some short lived files that > should not be in NFS. They hold chunks of data from threads that are > processing the genome and are very rapidly created and deleted. They will > be cleaned up automatically when maker finished or killed by standard > signals such as when you hit ^C or use kill 15. > > > Thanks, > Carson > > > > > On 13-05-14 4:42 PM, "Heywood, Todd" wrote: > > >We have been getting hung NFS mounts on some nodes when running MPI MAKER > >(version 2.27). Processes go into a "D" state and cannot be killed. We > >end up having to reboot nodes to recover them. We are running MPICH2 > >version 1.4.1p1 > >with RHEL 6.3. Questions: > > > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >on a sync_page system call under NFS. That *might* imply some locking > >issues. > > > >(2) Has anyone else seen this? > > > >(3) The root directory (parent of genome.maker.output directory) has lots > >of mpi***** files, all of which have the first line > >"pst0Process::MpiChunk". Is this expected? > > > >I'm able to reproducibly hang NFS on some nodes when using at least 4 > >32-core nodes and 128 running MPI tasks. > > > >Thanks, > > > >Todd Heywood > >CSHL > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 15 11:15:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 17:15:52 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Thu May 16 10:08:43 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Thu, 16 May 2013 17:08:43 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195048B.9080707@ebi.ac.uk> Hi Carson, When I was trying to load the Maker-2.27 results into ensembl, I found that few hundreds of genes with 'duplicate exons' . When I looked in the gff file, I found cases like this, where the exons are not actually duplicated but have two Parents with same mRNA ID. This can be a potential alternate transcript, attached to the same transcript by mistake? Many thanks Uma 3 maker gene 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179 3 maker mRNA 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_AED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 3 maker exon 524271 524480 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524538 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 524903 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 525182 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524271 524480 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524904 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 From carsonhh at gmail.com Thu May 16 10:13:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:13:05 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I've had one other report of this on the devel list, but haven't gotten data to test with. Do you have the run files that produced the duplicate exon? If so, cCould you send me theVoid directory for the contig that shows the dulicate, and the maker_opts.ctl file? Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 16 10:25:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:25:36 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I think this also may be a result of using GFF3 pass-through. So if that is the case, could you send me any GFF3 files you gave maker in addition to the other files I asked for. Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dsth at ebi.ac.uk Thu May 16 10:38:35 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 16 May 2013 17:38:35 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: <5195048B.9080707@ebi.ac.uk> Message-ID: hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 16 10:50:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:50:50 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: Message-ID: Yes. Perhaps this is the same issue Michael saw, although the one difference I see from his post is the Parent= attribute. --> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-proce ssed-gene-6.179-mRNA-1 I have seen duplicate exons from GFF3 pass-through in the past, but if that's not being used I'd be very appreciative of any test dataset you could give me. Thanks, Carson From: Daniel Hughes Date: Thursday, 16 May, 2013 12:38 PM To: Carson Holt Cc: Uma Maheswari , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] duplicate exons? hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ---------------------------------------------------------------------------- --------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I found >> >that few hundreds of genes with 'duplicate exons' . When I looked in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Fri May 17 02:41:56 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Fri, 17 May 2013 09:41:56 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195ED54.4090501@ebi.ac.uk> Hi Carson, I checked with Michael, this is different from what he saw, he had entire segements of gff files duplicated, In this case, just Parent id is. I am preparing the files you asked for, will send them soon thanks Uma On 16/05/13 17:50, Carson Holt wrote: > Yes. Perhaps this is the same issue Michael saw, although the one > difference I see from his post is the Parent= attribute. > > --> > Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 > > I have seen duplicate exons from GFF3 pass-through in the past, but if > that's not being used I'd be very appreciative of any test dataset you > could give me. > > Thanks, > Carson > > > > > From: Daniel Hughes > > Date: Thursday, 16 May, 2013 12:38 PM > To: Carson Holt > > Cc: Uma Maheswari >, > "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] duplicate exons? > > hiya, are you using the same instance as michael at ebi as this sounds > like the same problem he had last week and he wasn't running pass > through. i've run 2.27 here 30+ times here and not seen this? is > something very strange corrupted? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/16 Carson Holt > > > I think this also may be a result of using GFF3 pass-through. So > if that > is the case, could you send me any GFF3 files you gave maker in > addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" > wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked > in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.abriata at epfl.ch Fri May 17 03:45:41 2013 From: luciano.abriata at epfl.ch (Luciano Abriata) Date: Fri, 17 May 2013 09:45:41 +0000 Subject: [maker-devel] getting protein sequences from genomes Message-ID: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: 1) How can I match proteins predicted for the same gene in two genomes? 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta Which of these files contains the definite set of predicted protein sequences? Thanks in advance! Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Fri May 17 07:25:16 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Fri, 17 May 2013 13:25:16 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> It appears that a kernel bug caused the NFS hang, at least for limlted scale testing (6 nodes, 192 tasks). I upgraded the kernel from 2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and cannot reproduce the hangs. As far a TMPDIR, I'm not really sure I understand. We use SGE, and the TMPDIR we are referring to is set by SGE within a job to be /tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? Todd From: Carson Holt > Date: Wednesday, May 15, 2013 1:15 PM To: "Ernst, Evan" > Cc: Todd Heywood >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Fri May 17 07:40:50 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Fri, 17 May 2013 13:40:50 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> Message-ID: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt > >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" > >Cc: Todd Heywood >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst > >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt > >Cc: "Heywood, Todd" >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From barry.moore at genetics.utah.edu Fri May 17 13:02:31 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 17 May 2013 13:02:31 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Message-ID: On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? blastp tweaked with parameters to optimize near perfect match > > 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. QI values are as follows: 5' UTR Length Fraction of splice sites confirmed by EST alignment. Fraction of exons that overlap and EST alignment. Fraction of exons that overlap EST or protein alignment. Fraction of splice sites confirmed by an ab initio prediction. Fraction of exons that overlap an ab intitio prediction. Number of exons in the transcript. 3' UTR length. Length of encoded protein. > 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta > > Which of these files contains the definite set of predicted protein sequences? The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Sun May 19 22:16:10 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Mon, 20 May 2013 12:16:10 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Message-ID: Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 14:36:38 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 16:36:38 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> References: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> Message-ID: Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt > > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" > > >Cc: Todd Heywood >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst > > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt > > >Cc: "Heywood, Todd" >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > >> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2013-05-20 at 4.14.09 PM.png Type: image/png Size: 22634 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 20 17:50:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 19:50:28 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , "Heywood, Todd" Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > >> >It appears that a kernel bug caused the NFS hang, at least for limlted >> >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >> >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >> >cannot reproduce the hangs. >> > >> >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >> >TMPDIR we are referring to is set by SGE within a job to be >> >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> > >> >Todd >> > >> > >> > >> > >> >From: Carson Holt > >> >Date: Wednesday, May 15, 2013 1:15 PM >> >To: "Ernst, Evan" > >> >Cc: Todd Heywood >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >The mpi**** files should be generated in the $TMPDIR or TMP= location. >> >If they are happening in the working directory, then there is a problem. >> >If you are not setting TMP=, perhaps TMPDIR is not being exported when >> >'mpiexec' is launched. You may have to manually specify that it needs to >> >be exported to the other nodes using the mpiexec command line flags. >> >OpenMPI for example does not export all environmental variables by >> >default to the other nodes. >> > >> >Thanks, >> >Carson >> > >> > >> > >> >From: Evan Ernst > >> >Date: Wednesday, 15 May, 2013 1:08 PM >> >To: Carson Holt > >> >Cc: "Heywood, Todd" >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >Hi Carson, >> > >> >For these runs, -TMP is set to the $TMPDIR environment variable via maker >> >command line argument in the cluster job script to use the local disk on >> >each node. We can see files being generated in those locations on each >> >node, so it seems this is working as expected. >> > >> >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >> >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >> >onto a different, faster nfs mount than the working dir where the mpi**** >> >files are being written. >> > >> >Thanks, >> >Evan >> > >> > >> > >> >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> >> wrote: >> >No it does not use ROMIO. >> > >> >The locking may be do to how your NFS is implemented. MAKER does a lot of >> >small writes. Some NFS implementations do not handle that well and only >> >like large infrequent writes and frequent reads? >> >MAKER also uses a variant of the File:::NFSLock module which uses >> >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >> >enabled (described here >> >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >> >I know that the FhGFS implementation of NFS has broken hard link >> >functionality. >> > >> > >> >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >> >mounted location. It must be local (/tmp for example). This is because >> >certain types of operations are not always NFS safe and need a local >> >location to work with (anything involving berkley DB or SQLite for >> >example). Make sure you are not setting that to an NFS mounted scratch >> >location. The mpi**** files, are examples of some short lived files that >> >should not be in NFS. They hold chunks of data from threads that are >> >processing the genome and are very rapidly created and deleted. They will >> >be cleaned up automatically when maker finished or killed by standard >> >signals such as when you hit ^C or use kill 15. >> > >> > >> >Thanks, >> >Carson >> > >> > >> > >> > >> >On 13-05-14 4:42 PM, "Heywood, Todd" >> >> wrote: >> > >>> >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>> >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>> >>end up having to reboot nodes to recover them. We are running MPICH2 >>> >>version 1.4.1p1 >>> >>with RHEL 6.3. Questions: >>> >> >>> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>> >>on a sync_page system call under NFS. That *might* imply some locking >>> >>issues. >>> >> >>> >>(2) Has anyone else seen this? >>> >> >>> >>(3) The root directory (parent of genome.maker.output directory) has lots >>> >>of mpi***** files, all of which have the first line >>> >>"pst0Process::MpiChunk". Is this expected? >>> >> >>> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>> >>32-core nodes and 128 running MPI tasks. >>> >> >>> >>Thanks, >>> >> >>> >>Todd Heywood >>> >>CSHL >>> >> >>> >> >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 18:20:22 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 20:20:22 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: Message-ID: /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt wrote: > Could you run the following command for me and share the ouptut with me? > > mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > > Thanks, > Carson > > > > From: Evan Ernst > > Date: Monday, 20 May, 2013 4:36 PM > To: Carson Holt > > Cc: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, > "Heywood, Todd" > > Subject: Re: [maker-devel] MPI MAKER hanging NFS > > Hi Carson, > > The SGE launch script looks like this (sans SGE args): > > mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl > maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > > Snooping on the running jobs (see attached image), it looks like $TMPDIR > is evaluated to a local directory by the shell of the MPI master node as > intended, so the evaluated path, not the env var reference, is being passed > to the MPI workers. > > Despite this, the mpi*** files are still being created in the working > directory. > > If I understand correctly, these mpi*** files are meant to be written to > the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), > which should be equivalent, but this doesn't seem to be the case. > > Thanks, > Evan > > > > > On Fri, May 17, 2013 at 9:40 AM, Carson Holt > wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" heywood at cshl.edu>> wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt >>> > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" eernst at cshl.edu>> > >Cc: Todd Heywood heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst eernst at cshl.edu>> > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt >>> > >Cc: "Heywood, Todd" heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > > Carson.Holt at oicr.on.ca>> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > maker-devel at box290.bluehost.com>> > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 20 18:38:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 20:38:41 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> Message-ID: It may have just been a random failure. Try launching it again. Basically one instance failed to launch hydra_pmi_proxy which wraps the command being called via mpiexec. So you get 7 lines of output instead of the 8 that should be there. --Carson On 13-05-20 8:33 PM, "Heywood, Todd" wrote: >All starter_with_limit.sh does is set a ulimit for the top process for >the job, then start it passing all parameters: > >#!/bin/sh >ulimit -c 0 >exec $* > > >From: Evan Ernst > >Date: Monday, May 20, 2013 8:20 PM >To: Carson Holt > >Cc: Carson Holt >, >"maker-devel at yandell-lab.org" >>, Todd >Heywood > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/opt/uge/default/common/starter_with_limit.sh: line 4: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": No such file or directory >/opt/uge/default/common/starter_with_limit.sh: line 4: exec: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": cannot execute: No such file or directory > > >Todd, are these errors from the starter_with_limit.sh wrapper harmless? > >Thanks, >Evan > > >On Mon, May 20, 2013 at 7:50 PM, Carson Holt >> wrote: >Could you run the following command for me and share the ouptut with me? > >mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > >Thanks, >Carson > > > >From: Evan Ernst >nst at cshl.edu>>> >Date: Monday, 20 May, 2013 4:36 PM >To: Carson Holt >oicr.on.ca>> >Cc: >"maker-devel at yandell-lab.orgker-devel at yandell-lab.org>" >ker-devel at yandell-lab.org>>, >"Heywood, Todd" >heywood at cshl.edu>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >The SGE launch script looks like this (sans SGE args): > >mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl >maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > >Snooping on the running jobs (see attached image), it looks like $TMPDIR >is evaluated to a local directory by the shell of the MPI master node as >intended, so the evaluated path, not the env var reference, is being >passed to the MPI workers. > >Despite this, the mpi*** files are still being created in the working >directory. > >If I understand correctly, these mpi*** files are meant to be written to >the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), >which should be equivalent, but this doesn't seem to be the case. > >Thanks, >Evan > > > > >On Fri, May 17, 2013 at 9:40 AM, Carson Holt >oicr.on.ca>> wrote: >I'm glad your getting better results. > >With respect to environmental variables. One common error in MPI >execution is that the environment variables will not always be the same on >the other nodes since only the root node is attached to a terminal, so >variables in launch scripts (.bashrc etc.) may not be available on all >nodes. Many clusters that are part of the XSEDE network and use SGE for >example have scripts that wrap mpiexec to guarantee export of all >environmental variables when using MPI to avoid just this type of common >error. So like anything, you start with the most common cause of errors >and then work to the less common. Kernel bugs usually rank low on the >list :-) But I'm glad it's working for you now. > >Thanks, >Carson > > > > > >On 13-05-17 9:25 AM, "Heywood, Todd" >heywood at cshl.edu>>> wrote: > >>It appears that a kernel bug caused the NFS hang, at least for limlted >>scale testing (6 nodes, 192 tasks). I upgraded the kernel from >>2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >>cannot reproduce the hangs. >> >>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >>TMPDIR we are referring to is set by SGE within a job to be >>/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> >>Todd >> >> >> >> >>From: Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> >>Date: Wednesday, May 15, 2013 1:15 PM >>To: "Ernst, Evan" >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Cc: Todd Heywood >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>The mpi**** files should be generated in the $TMPDIR or TMP= location. >>If they are happening in the working directory, then there is a problem. >>If you are not setting TMP=, perhaps TMPDIR is not being exported when >>'mpiexec' is launched. You may have to manually specify that it needs to >>be exported to the other nodes using the mpiexec command line flags. >>OpenMPI for example does not export all environmental variables by >>default to the other nodes. >> >>Thanks, >>Carson >> >> >> >>From: Evan Ernst >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Date: Wednesday, 15 May, 2013 1:08 PM >>To: Carson Holt >>>@oicr.on.ca>>>on.holt at oicr.on.ca>>>> >>Cc: "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>Hi Carson, >> >>For these runs, -TMP is set to the $TMPDIR environment variable via maker >>command line argument in the cluster job script to use the local disk on >>each node. We can see files being generated in those locations on each >>node, so it seems this is working as expected. >> >>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >>relevant, but I'm also setting mpi_blastdb= to consolidate the databases >>onto a different, faster nfs mount than the working dir where the mpi**** >>files are being written. >> >>Thanks, >>Evan >> >> >> >>On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> wrote: >>No it does not use ROMIO. >> >>The locking may be do to how your NFS is implemented. MAKER does a lot >>of >>small writes. Some NFS implementations do not handle that well and only >>like large infrequent writes and frequent reads? >>MAKER also uses a variant of the File:::NFSLock module which uses >>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >>enabled (described here >>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >>I know that the FhGFS implementation of NFS has broken hard link >>functionality. >> >> >>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >>mounted location. It must be local (/tmp for example). This is because >>certain types of operations are not always NFS safe and need a local >>location to work with (anything involving berkley DB or SQLite for >>example). Make sure you are not setting that to an NFS mounted scratch >>location. The mpi**** files, are examples of some short lived files that >>should not be in NFS. They hold chunks of data from threads that are >>processing the genome and are very rapidly created and deleted. They >>will >>be cleaned up automatically when maker finished or killed by standard >>signals such as when you hit ^C or use kill 15. >> >> >>Thanks, >>Carson >> >> >> >> >>On 13-05-14 4:42 PM, "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>> wrote: >> >>>We have been getting hung NFS mounts on some nodes when running MPI >>>MAKER >>>(version 2.27). Processes go into a "D" state and cannot be killed. We >>>end up having to reboot nodes to recover them. We are running MPICH2 >>>version 1.4.1p1 >>>with RHEL 6.3. Questions: >>> >>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>>on a sync_page system call under NFS. That *might* imply some locking >>>issues. >>> >>>(2) Has anyone else seen this? >>> >>>(3) The root directory (parent of genome.maker.output directory) has >>>lots >>>of mpi***** files, all of which have the first line >>>"pst0Process::MpiChunk". Is this expected? >>> >>>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>>32-core nodes and 128 running MPI tasks. >>> >>>Thanks, >>> >>>Todd Heywood >>>CSHL >>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com>ailto:maker-devel at box290.bluehost.com>com>>>uehost.com>>290.bluehost.com>>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ maker-devel mailing list >maker-devel at box290.bluehost.comilto:maker-devel at box290.bluehost.comm>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From heywood at cshl.edu Mon May 20 18:33:32 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:33:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters: #!/bin/sh ulimit -c 0 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From heywood at cshl.edu Mon May 20 18:34:48 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:34:48 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130FB@ex-hs-mbx06.cshl.edu> Actually, line 4 is the exec (one line is commented out): #!/bin/sh ulimit -c 0 #ulimit -n 262144 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Mon May 20 18:48:32 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 21 May 2013 00:48:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you use the attached file to replace maker/src/bin/maker and maker/bin/maker? You will have to rerun 'maker/src/Build install' or just edit the shebang line (#!) if perl is located anywhere other than /usr/bin/perl. I explicitly tell it to use the system TMPDIR rather than letting it get set implicitly. See if that stops the mpi***** files in the working directory. It's always possible that this is just a slight difference in behavior for the version of the File::Temp module that is packaged with your perl. --Carson From: Evan Ernst > Date: Monday, 20 May, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, "Heywood, Todd" > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker Type: application/octet-stream Size: 49266 bytes Desc: maker URL: From carsonhh at gmail.com Mon May 20 19:08:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 21:08:51 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: On default settings MAKER will only put ab initio predictions that have some sort of evidence support (EST or protein) in the final gene set. The rejected predictions are still in the GFF3 for reference purposes as match/match_part features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might be why it is not there. You can add all rejected models that don't overlap an accepted model by setting keep_preds=1 (this usually brings a lot more into the final gene set than you really want though (lots of false positives). But for some organisms like fungi, which have high gene densities, this approach is relatively safe. Alternatively the gene is missing because it overlaps another gene model that was accepted. MAKER won't allow overlapping models on the same strand in eukaryotes. The only way to force that kind of overlap is to give MAKER the reference models in model_gff and not let it call it's own models (then maker is really just aligning evidence and scoring the reference models). One final note. If there is no evidence supporting the model, and that is why it is rejected, you can also try adding more evidence to the maker run or you can consider the possibility that the gene model in the reference is not real to being with (i.e. a false positive gene model called during the initial annotation process and not supported by protein or expression data from any source). Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 12:16 AM To: Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Mon May 20 19:19:20 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Tue, 21 May 2013 09:19:20 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: References: Message-ID: Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have > some sort of evidence support (EST or protein) in the final gene set. The > rejected predictions are still in the GFF3 for reference purposes as > match/match_part features, but not as gene/mRNA/exon/CDS features. So a > lack of evidence might be why it is not there. You can add all rejected > models that don't overlap an accepted model by setting keep_preds=1 (this > usually brings a lot more into the final gene set than you really want > though (lots of false positives). But for some organisms like fungi, which > have high gene densities, this approach is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model > that was accepted. MAKER won't allow overlapping models on the same strand > in eukaryotes. The only way to force that kind of overlap is to give MAKER > the reference models in model_gff and not let it call it's own models (then > maker is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is > why it is rejected, you can also try adding more evidence to the maker run > or you can consider the possibility that the gene model in the reference is > not real to being with (i.e. a false positive gene model called during the > initial annotation process and not supported by protein or expression data > from any source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present > in the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure > were absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Mon May 20 23:57:19 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 21 May 2013 13:57:19 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column Message-ID: Hi all By my reading of the GFF3 spec ( http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 01:36:37 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Tue, 21 May 2013 07:36:37 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue May 21 17:54:40 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 21 May 2013 17:54:40 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Tue May 21 09:58:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Tue, 21 May 2013 10:58:43 -0500 Subject: [maker-devel] Maker: Re-annotation Message-ID: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 18:59:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 20:59:09 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. > Maker has been chosen as the preliminary tool for the task. By checking the > annotation results by using maker 2.10, we saw some loci have the fusion > problem: two separate neighbour genes are likely to be fused together and > regarded as a single candidate output by maker. If we go further by looking at > the outputs from each individual de novo algorithm, e.g. augustus or snap, the > prediction was correct. We are also using RNA-Seq assembly from cufflinks and > some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and > ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when > to merge models or not. If possible, can you please explain what these two > parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to > predict UTRs, enlarge the protein evidence datasets and input our previous > annotations as model_gff. We are facing with an critical question: in which > way we could effectively improve the gene fusing problem? 1) setting the > pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything > else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 19:23:48 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 01:23:48 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:37:02 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:37:02 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:39:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:39:01 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 20:23:26 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 02:23:26 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Thank you Carson. It has been a very helpful conversation with you! I will pass these information back to our group. Best regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 11:39 AM To: Li, Sean (CMIS, Acton); barry.moore at genetics.utah.edu Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: > Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt >, Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:28:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:28:46 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: The option in trinity is --jaccard_clip --> http://trinityrnaseq.sourceforge.net/#jaccard_clip --Carson From: Innocent Onsongo Date: Tuesday, 21 May, 2013 11:58 AM To: Subject: [maker-devel] Maker: Re-annotation Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:32:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:32:54 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: It looks like the phase was calculated from the wrong strand orientation. I believe I have corrected this now. I'm checking a few more things, but I'll have 2.28 as the latest release likely tomorrow with the cumulative bug fixes since the last release. Thanks, Carson From: Rob Syme Date: Tuesday, 21 May, 2013 1:57 AM To: Subject: [maker-devel] Maker-derived CDS GFF3 phase column Hi all By my reading of the GFF3 spec (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:37:30 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:37:30 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <2024BE21-4293-4E9D-BE13-92774C7BC96D@gmail.com> Sean, The Trinity option to manage fusion transcripts is --jaccard_clip and is described here: http://trinityrnaseq.sourceforge.net/#jaccard_clip Trinity has also added functionality to use a hybrid reference-guided/de-novo assembly approach which you might also consider: http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html B On May 21, 2013, at 7:37 PM, Carson Holt wrote: > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > > No. Trinity would probably be a better approach to avoid merging. > > > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). > > --Carson > > > > > > From: > Date: Tuesday, 21 May, 2013 9:23 PM > To: Carson Holt , Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. > > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > Regards, > Sean > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Wednesday, 22 May 2013 10:59 AM > To: Barry Moore; Li, Sean (CMIS, Acton) > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. > > I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). > > Thanks, > Carson > > > From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM > To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Hi Sean, > > I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. > > Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. > > B > > On May 21, 2013, at 1:36 AM, > wrote: > > > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:43:47 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:43:47 -0600 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Tue May 21 22:04:04 2013 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 22 May 2013 12:04:04 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: References: Message-ID: Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. > I believe I have corrected this now. I'm checking a few more things, but > I'll have 2.28 as the latest release likely tomorrow with the cumulative > bug fixes since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec ( > http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from > Maker that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon", and for "reverse strand features, phase is counted from the end > field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) > is 5953. > The base at the end field is the first base of the translated CDS, so > there should be no bases removed "to reach the first base of the next > codon". I suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon" is measured from the 'left-hand' end of this feature (the start > field) rather than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:50:26 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:50:26 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: One other thing, I ran MAKER with the RM_off flag (maker -f -RM_off -q) the input sequences had already been masked. On Wed, May 22, 2013 at 7:47 AM, Innocent Onsongo wrote: > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it > will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:47:30 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:47:30 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > Hi Getiria, > > Does the MAKER produced GFF3 file contain any annotations at all? Can you > send the first ~100 lines each of the MAKER produced GFF3 file and of the > GFF3 files that you passed via maker_opts.ctl? > > B > > On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from > Augustus. We had previously used Augustus for gene prediction but now want > to combine these annotations with some EST data. I updated > fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was > successful. However, it also gives a "Segmentation fault (core dumped)" > message. It does produce a GFF3 file but when I load the GFF3 file into IGV > and look it does not contain any of the exon definitions in Augustus.gff3. > Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AugustusFirst100.gff3 Type: application/octet-stream Size: 9703 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS01058First100.gff Type: application/octet-stream Size: 5665 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CpG_IslandsFirst100.gff3 Type: application/octet-stream Size: 1964 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EST2ScaffoldFirst100.gff3 Type: application/octet-stream Size: 9901 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PromotersFirst100.gff3 Type: application/octet-stream Size: 113 bytes Desc: not available URL: From Carson.Holt at oicr.on.ca Wed May 22 08:03:14 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 22 May 2013 14:03:14 +0000 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Are you using MAKER version 2.10? I ask because there is in issue with other_gff in that version that has since been fixed. So if you don't get other_gff to pass-through, you will need to upgrade to 2.28 (release date is later today coincidentally). For the Augustus GFF3 file, the format is a little weird which is causing the problem. They are mRNA features not attached to genes. Rather than build the expected 3 level gene/mRNA/exon structure for these, it is simpler just to convert it to the 2 level match/match_part structure. Just convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your done so that it will force rebuild of the GFF3 database when you run again. Thanks, Carson From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore > wrote: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:38:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:38:50 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: I've released 2.28 on the website. This is one of the bugs that was fixed. It happens under a very specific set of circumstances. You need to run maker with the -a command line flag to get it to recalculate upstream variables after upgrading. Alternatively you can also just give maker your old GFF3 file (make all other options blank exempt for the *_pass= options), and maker will just rebuild it. Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 9:19 PM To: Carson Holt Cc: Subject: Re: [maker-devel] Why are some complete gene predictions not present in the final results? Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have some > sort of evidence support (EST or protein) in the final gene set. The rejected > predictions are still in the GFF3 for reference purposes as match/match_part > features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might > be why it is not there. You can add all rejected models that don't overlap an > accepted model by setting keep_preds=1 (this usually brings a lot more into > the final gene set than you really want though (lots of false positives). But > for some organisms like fungi, which have high gene densities, this approach > is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model that > was accepted. MAKER won't allow overlapping models on the same strand in > eukaryotes. The only way to force that kind of overlap is to give MAKER the > reference models in model_gff and not let it call it's own models (then maker > is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is why > it is rejected, you can also try adding more evidence to the maker run or you > can consider the possibility that the gene model in the reference is not real > to being with (i.e. a false positive gene model called during the initial > annotation process and not supported by protein or expression data from any > source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present in > the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure were > absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:39:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:39:53 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: Ok. It's available for download. --Carson From: Rob Syme Date: Wednesday, 22 May, 2013 12:04 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Maker-derived CDS GFF3 phase column Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. I > believe I have corrected this now. I'm checking a few more things, but I'll > have 2.28 as the latest release likely tomorrow with the cumulative bug fixes > since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec > (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker > that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next codon", > and for "reverse strand features, phase is counted from the end field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is > 5953. > The base at the end field is the first base of the translated CDS, so there > should be no bases removed "to reach the first base of the next codon". I > suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed from > the beginning of this feature to reach the first base of the next codon" is > measured from the 'left-hand' end of this feature (the start field) rather > than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Thu May 23 10:40:23 2013 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 23 May 2013 10:40:23 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch>, <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> Message-ID: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Hi Liciano, If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. B On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > Thanks for your reply! > > One more question, can you think of any tips to get the best possible predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? > > Thanks again, > > Luciano > > De: Barry Moore [barry.moore at genetics.utah.edu] > Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. > Para: Luciano Abriata > Cc: maker-devel at yandell-lab.org > Asunto: Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > >> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >> >> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >> >> 1) How can I match proteins predicted for the same gene in two genomes? > > blastp tweaked with parameters to optimize near perfect match > >> >> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >> >> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >> > > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. > > QI values are as follows: > > 5' UTR Length > Fraction of splice sites confirmed by EST alignment. > Fraction of exons that overlap and EST alignment. > Fraction of exons that overlap EST or protein alignment. > Fraction of splice sites confirmed by an ab initio prediction. > Fraction of exons that overlap an ab intitio prediction. > Number of exons in the transcript. > 3' UTR length. > Length of encoded protein. > > >> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >> >> Which of these files contains the definite set of predicted protein sequences? > > The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > >> >> >> >> Thanks in advance! >> >> Luciano >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Thu May 23 10:48:05 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 23 May 2013 17:48:05 +0100 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and > Augustus predictions as well as the predictions. If so, you don't want to > do that. An overlapping protein evidence is sufficient to promote a > prediction to an annotation, so by providing the protein translation of the > prediction along with the prediction you will guarantee that every > prediction will become an annotation and that means you lose the benefit of > evidence supervised annotation that MAKER provides. Include the proteins > from the D mel reference and if you want to cast a broader net include > proteins from other dipterans or even Uniprot - just depend on how > aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > > Thanks for your reply! > > One more question, can you think of any tips to get the best possible > predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be > real and don't exist if I blast them, plus a few others which don't start > with Methionine... So far I am including transcripts and translations from > flybase, and snap and augustus with their available trainings for flies. Do > you see any possible source of error in that? > > Thanks again, > > Luciano > > ------------------------------ > *De:* Barry Moore [barry.moore at genetics.utah.edu] > *Enviado el:* viernes, 17 de mayo de 2013 09:02 p.m. > *Para:* Luciano Abriata > *Cc:* maker-devel at yandell-lab.org > *Asunto:* Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > > Hello, I am trying to use Maker to annotate genomes from different > individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the > coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? > > > blastp tweaked with parameters to optimize near perfect match > > > 2) What is the meaning of all the data in a line such as the following one > (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 > eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > > > AED = Annotation edit distance describes how closely the prediction > matches the evidence. This is a distance measure and thus 0 is a perfect > match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as > AED with a couple of exceptions. For a protein coding exon to be counted > as overlapping protein evidence the reading frame must be the same in the > coding exon and the protein evidence. Second, when mRNA Seq data is used > as evidence and both ends of an exon are supported with splice site > spanning reads, the middle of that exon is counted as supported as well > even if coverage drops off in the interior of the exon.. For the most part > AED and eAED will always be the same, but eAED tends to work better on many > fringe cases. > > QI values are as follows: > > > 1. 5' UTR Length > 2. Fraction of splice sites confirmed by EST alignment. > 3. Fraction of exons that overlap and EST alignment. > 4. Fraction of exons that overlap EST or protein alignment. > 5. Fraction of splice sites confirmed by an ab initio prediction. > 6. Fraction of exons that overlap an ab intitio prediction. > 7. Number of exons in the transcript. > 8. 3' UTR length. > 9. Length of encoded protein. > > > > 3) If I include snap and augustus to improve protein predictions, I get > several protein.fasta files: augustus_masked.proteins.fasta , > snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and > proteins.fasta > > Which of these files contains the definite set of predicted protein > sequences? > > > The proteins.fasta file is the final set of proteins for all genes that > MAKER created annotations for. > > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Thu May 23 14:17:00 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Thu, 23 May 2013 16:17:00 -0400 Subject: [maker-devel] Advice on params for ciliates Message-ID: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Fri May 24 07:10:15 2013 From: daniel.standage at gmail.com (Daniel Standage) Date: Fri, 24 May 2013 09:10:15 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments Message-ID: Greetings! I have some precomputed transcript and protein alignments that I would like to use with Maker. I have converted them into GFF3 format (see attached examples) and provided them to their corresponding entries (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. Unfortunately, Maker seems to be getting caught up on processing these GFF3 files. I've tried running Maker 2.10 as well as the development version (checked out a few months ago--svn server isn't responding so I can't give a precise revision number), and in both cases Maker hangs while trying to create the GFF3 database. These are the last lines I see in STDERR when * --debug* is set. STATUS: Setting up database for any GFF3 input... Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 587. I can't find any documentation specifying any explicit requirements for the alignment-containing GFF3 input files. Maker output uses the pretty canonical *expressed_sequence_match*, *protein_match*, and *match_part*features for encoding alignments, and I have used this convention with my input (see attached examples). I have also double-checked that my examples are valid GFF3, so my guess is that Maker has additional constraints/expectations for certain fields in the GFF3 files (score column? required attributes?). Is this correct, and if so would you be able to point me toward any related documentation I may have missed? Many thanks. -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prot-example.gff3 Type: application/octet-stream Size: 1080 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trans-example.gff3 Type: application/octet-stream Size: 1306 bytes Desc: not available URL: From guoyunfei1989 at gmail.com Fri May 24 10:15:19 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 24 May 2013 09:15:19 -0700 Subject: [maker-devel] ./FINISHED/FINISHED.gff Message-ID: Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGGTGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTAACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip...) maker 2.27 Thanks, Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 10:22:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 12:22:05 -0400 Subject: [maker-devel] ./FINISHED/FINISHED.gff In-Reply-To: Message-ID: Sometime the master_datastore_index.log gets munged by MPI (processes print at the same time). You can rebuild it by running a single instance of maker whith the -dsindex flag. It only takes about 60 seconds to rebuild. Example: cd maker -dsindex --Carson From: Yunfei Guo Date: Friday, 24 May, 2013 12:15 PM To: Subject: [maker-devel] ./FINISHED/FINISHED.gff Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGG TGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTA ACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip ...) maker 2.27 Thanks, Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 14:06:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 16:06:51 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments In-Reply-To: Message-ID: I'm glad it's working. I think I'll add a check for the '/' characters in the base name as I think having it be a directory will get me in trouble somewhere with hidden bugs. Thanks, Carson From: Daniel Standage Date: Friday, 24 May, 2013 4:00 PM To: Carson Holt Subject: Re: [maker-devel] Using maker with precomputed transcript / protein alignments Oh wow, you are going to LOVE this. I kept on messing around with things to see if I could tease out any patterns, and eventually it hit me. In my working directory, I have an outputs directory, which is intended to contain output directories from various different maker runs. However, since my submission scripts launch Maker from the working directory, I use -base outputs/blahblahblah as the base parameter. So when it tries to create output files using the base name (the SQLite3 db just happens to be the first), it tries to create outputs/blahblahblah/outputs/blahblahblah.db, and of course that internal outputs directory doesn't exist. Every time I've had problems, I've been using a basename with a / character (relative directory path). Every time I haven't had problems, it was because the / wasn't there. Since the base parameter determines the name of the output directory, I assumed I could also use it specify a nested output directory. So it looks like I just need to be more careful that the basenames I use don't contain / characters or any other special UNIX characters. Of course, this could be made explicit in the usage statement, or you could add something like this right after parsing the command line arguments. if($OPT{"out_name"} =~ m/\//) { printf(STDERR "base '%s' invalid: basenames containing relative directory paths cause errors; please provide a simple string instead", $OPT{"out_name"}); exit_maker(0); } Alternatively, you could handle things like I had originally expected: if I provide path/to/mybase as my base parameter, maker would create the path/to/mybase directory initially, but then in the creation of subsequent files it would simply use mybase. I don't imagine this would be too extensive of a change, but I understand Maker has a huge codebase. Anyway, just some suggestions, take them for what they're worth. Thanks for your help! -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University On Fri, May 24, 2013 at 3:29 PM, Carson Holt wrote: > NFS is weird. It's hard to say why it was freezing the first times, and did > not appear to freeze on your very last try. I definitely want to know if it > starts to freeze again, or if stack traces show a consistent point where it > freezes. If it keeps happening, I can try making the database in the local > /tmp and then just copying it to the current working directory once it's > populated to get around any weird NFS issues. But before going through all > the effort to do that, I'd like to know that it's not some other weird bug > related to the perl your using or other modules that are installed. Top > candidates on the list would be modules such as forks, forks::shared, DBI, or > DBD::SQLite. Try reinstalling those > > Thanks, > Carson > > > From: Daniel Standage > Date: Friday, 24 May, 2013 3:19 PM > > To: Carson Holt > Subject: Re: [maker-devel] Using maker with precomputed transcript / protein > alignments > > I admit I killed these last few runs too quickly, I guess I was getting > impatient, especially since waiting hours or days hasn't made a difference > before. Either way, that was sloppy on my part. > > However, I always specify the base parameter, whether or not I'm running > mulitple maker jobs from the same directory. And if I ever restarted a job, I > have always removed the original output directory entirely before > relaunching--precisely to avoid the types of mistakes you mention arising from > residual files. > > -- > Daniel S. Standage > Ph.D. Candidate > Bioinformatics and Computational Biology Program > Department of Genetics, Development, and Cell Biology > Iowa State University > > > On Fri, May 24, 2013 at 3:10 PM, Carson Holt wrote: >> Correct if you use the -base parameter you should get a different output >> directory. And if you have never used that base before, and it still >> freezes, then there is a problem. You do need to give it a little more time >> until killing it, as the stack trace in both cases showed that it was less >> than 25% finished reading the input GFF3 files and even less than that in the >> first case (so give it about 5x as long before giving up). >> >> It might just be that the NFS mount is slow. Or because of how weird the >> error is, other options include reinstalling perl and all modules. The >> weirdest bugs are often broken perl or inadvertently using modules from >> different perl versions via the PERL5LIB environmental variable (this is very >> common and can cause very wacky behavior). Another option is verifying all >> software for the lustre NFS mount is up to date. Lastly there was an odd NFS >> bug that came up on the e-mail list last week that was fixed by a kernel >> upgrade. >> >> --Carson >> >> >> >> From: Daniel Standage >> Date: Friday, 24 May, 2013 3:01 PM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Using maker with precomputed transcript / protein >> alignments >> >> The file locks are created only in the output directory, no? So there is a >> problem if I have multiple maker runs launched from the same directory, but >> writing to different output directories (as specified by different base >> parameters)? >> >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Bioinformatics and Computational Biology Program >> Department of Genetics, Development, and Cell Biology >> Iowa State University >> >> >> On Fri, May 24, 2013 at 2:57 PM, Carson Holt wrote: >>> To clarify, that means you need to use a different working directory. Can >>> be a subdirectory of your original. >>> >>> --Carson >>> >>> >>> From: Carson Holt >>> Date: Friday, 24 May, 2013 2:56 PM >>> To: Daniel Standage >>> >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Both stack traces show different locations in the code and file being read. >>> So it appears it was not frozen, just interrupted by control-C. >>> >>> If you restart make sure you do so in a completely new directory from the >>> original run. This is because I wonder if there is a failed job that still >>> has active processes and is holding onto file locks in that directory. >>> >>> --Carson >>> >>> >>> From: Daniel Standage >>> Date: Friday, 24 May, 2013 2:50 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Deleted output directory and re-ran. Stack trace looks pretty similar. >>> >>> >>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line >>> 607. >>> SIGINT received >>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>> line 97, <$IN >>>> > line 243676. >>> forks::signals::__ANON__('INT') called at /usr/lib64/perl5/DBI.pm >>> line 1590 >>> eval {...} called at /usr/lib64/perl5/DBI.pm line 1590 >>> DBD::_::db::do('DBI::db=HASH(0x4987228)', 'INSERT INTO est_gff >>> (seqid, source, parent, start, end, line)...') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 493 >>> GFFDB::_add_to_db('GFFDB=HASH(0x49727a0)', >>> 'DBI::db=HASH(0x49871e0)', 'est_gff', 'HASH(0x49877e0)') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 432 >>> GFFDB::_add_type('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>> 'est_gff') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>> GFFDB::add_est('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>> GFFDB::new('GFFDB', 'HASH(0x489c488)') called at >>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>> >>> >>> -- >>> Daniel S. Standage >>> Ph.D. Candidate >>> Bioinformatics and Computational Biology Program >>> Department of Genetics, Development, and Cell Biology >>> Iowa State University >>> >>> >>> On Fri, May 24, 2013 at 2:45 PM, Carson Holt wrote: >>>> Could you run again, and so I can see if the stack trace is the same each >>>> time. >>>> >>>> --Carson >>>> >>>> >>>> From: Daniel Standage >>>> Date: Friday, 24 May, 2013 2:39 PM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>> protein alignments >>>> >>>> Restarted in the original NSF-mounted directory, never saw the .db file, >>>> and got this as the stack trace upon termination. >>>> >>>> STATUS: Setting up database for any GFF3 input... >>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>> line 607. >>>> SIGINT received >>>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>>> line 97, <$IN> line 170294. >>>> forks::signals::__ANON__('INT') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> eval {...} called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> GFFDB::_parse_line('GFFDB=HASH(0x4e5c730)', 'SCALAR(0x4e714b8)', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 431 >>>> GFFDB::_add_type('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>>> GFFDB::add_est('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>>> GFFDB::new('GFFDB', 'HASH(0x4d86488)') called at >>>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>>> >>>> >>>> -- >>>> Daniel S. Standage >>>> Ph.D. Candidate >>>> Bioinformatics and Computational Biology Program >>>> Department of Genetics, Development, and Cell Biology >>>> Iowa State University >>>> >>>> >>>> On Fri, May 24, 2013 at 2:25 PM, Carson Holt wrote: >>>>> Start a new job in a new directory from the original job (NFS mount). Use >>>>> the new maker executable I sent. If it still freezes, hit control-C to >>>>> get a stack trace. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> From: Daniel Standage >>>>> Date: Friday, 24 May, 2013 2:21 PM >>>>> >>>>> To: Carson Holt >>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>> protein alignments >>>>> >>>>> The job from several hours ago is still running with no changes. >>>>> >>>>> I just relaunched the job with a locally mounted working directory: I >>>>> could see the .db file almost immediately, and it took less than 5 minutes >>>>> to successfully build the SQLite3 db and proceed to the next steps of the >>>>> pipeline. Any ideas? >>>>> >>>>> -- >>>>> Daniel S. Standage >>>>> Ph.D. Candidate >>>>> Bioinformatics and Computational Biology Program >>>>> Department of Genetics, Development, and Cell Biology >>>>> Iowa State University >>>>> >>>>> >>>>> On Fri, May 24, 2013 at 2:01 PM, Carson Holt wrote: >>>>>> The NFS mount appears to be configured correctly. >>>>>> >>>>>> Here is what the maker.output directory should look like while the >>>>>> database is being generated. >>>>>> >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:51 . >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:50 .. >>>>>> -rw------x 1 cholt staff 85 24 May 13:50 >>>>>> .NFSLock.gi_lock.NFSLock >>>>>> -rw------- 1 cholt staff 52 24 May 13:50 >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> -rw-r--r-- 1 cholt staff 1413 24 May 13:50 maker_bopts.log >>>>>> -rw-r--r-- 1 cholt staff 1666 24 May 13:50 maker_exe.log >>>>>> -rw-r--r-- 1 cholt staff 4610 24 May 13:50 maker_opts.log >>>>>> drwxr-xr-x 4 cholt staff 136 24 May 13:50 mpi_blastdb >>>>>> -rw-r--r-- 1 cholt staff 29326336 24 May 13:51 pdom-annot-v0.41-1.db >>>>>> -rw-r--r-- 1 cholt staff 6704 24 May 13:51 >>>>>> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> >>>>>> Could you watch while maker is running to see if this file is created --> >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> You must use ls with the -a flag to see it or it will be hidden. >>>>>> >>>>>> Just keep letting it run until that file shows up. Shortly after it sows >>>>>> up, this one should appear --> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> Also could you try running MAKER once with the working directory being >>>>>> locally mounted (/tmp for example). >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Daniel Standage >>>>>> Date: Friday, 24 May, 2013 1:36 PM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>> protein alignments >>>>>> >>>>>> Here is the output. >>>>>> >>>>>> [dstandag at mason annot-v0.41] ls -al >>>>>> outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> total 32 >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 13:34 . >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 .. >>>>>> -rw-r--r-- 1 dstandag biol 1413 May 24 12:39 maker_bopts.log >>>>>> -rw-r--r-- 1 dstandag biol 1355 May 24 12:39 maker_exe.log >>>>>> -rw-r--r-- 1 dstandag biol 4883 May 24 12:39 maker_opts.log >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 mpi_blastdb >>>>>> -rw------x 1 dstandag biol 70 May 24 13:34 .NFSLock.gi_lock.NFSLock >>>>>> [dstandag at mason annot-v0.41] df outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>>> dc-mds01.uits.indiana.edu:/dc >>>>>> 1144318908992 928977247792 203869022296 83% /N/dc >>>>>> [dstandag at mason annot-v0.41] mount >>>>>> login_x86_64 on / type tmpfs (rw) >>>>>> proc on /proc type proc (rw) >>>>>> sysfs on /sys type sysfs (rw) >>>>>> devpts on /dev/pts type devpts (rw,gid=5,mode=620) >>>>>> tmpfs on /dev/shm type tmpfs (rw) >>>>>> tmpfs on /var/tmp type tmpfs (rw,size=10m) >>>>>> /dev/sdb2 on /tmp type ext4 (rw,relatime,barrier=1,data=ordered) >>>>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) >>>>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) >>>>>> AFS on /afs type afs (rw) >>>>>> bl-nas1:/vol/hd00 on /N/hd00 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas1:/vol/hd01 on /N/hd01 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/hd02 on /N/hd02 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas2:/vol/hd03 on /N/hd03 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/hdln on /N/u type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/soft on /N/soft type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/logs on /N/logs type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> none on /dev/cpuset type cpuset (rw) >>>>>> dc-mds01.uits.indiana.edu:/dc on /N/dc type lustre (rw,localflock) >>>>>> 149.165.235.173:/mds-wan/client on /N/dcwan type lustre (rw,localflock) >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daniel S. Standage >>>>>> Ph.D. Candidate >>>>>> Bioinformatics and Computational Biology Program >>>>>> Department of Genetics, Development, and Cell Biology >>>>>> Iowa State University >>>>>> >>>>>> >>>>>> On Fri, May 24, 2013 at 1:29 PM, Carson Holt wrote: >>>>>>> They load fine for me. It is an SQLite database. I know that SQLlite >>>>>>> can freeze on NFS if it's not configured properly. >>>>>>> >>>>>>> Could you send me the output from these 3 commands. >>>>>>> >>>>>>> ls -al >>>>>>> df >>>>>>> mount >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 1:13 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I deleted the entire output directory before relaunching. No .db files >>>>>>> are even created, only the mpi_blastdb directory with the genomic >>>>>>> sequence data and corresponding index, before it hangs. >>>>>>> >>>>>>> The GFF3 files are attached. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 12:57 PM, Carson Holt >>>>>>> wrote: >>>>>>> Did you delete any *.db files in the maker.output directory first. If >>>>>>> not do that, and check on the rerun if that file is growing in size. It >>>>>>> is a database to hold the GFF3 file entries. It's final size should be >>>>>>> ~ 2x the size of the combined GFF3 files. If it is growing, then it is >>>>>>> not really frozen (you just need to give it more time). If it is not >>>>>>> growing, send me your GFF3 files and I can try and duplicate the error. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 12:50 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I installed BioPerl-1.6.901, rebuilt Maker, and re-launched the job. >>>>>>> After running for 10-15 minutes, it seems to be hanging in the same >>>>>>> place as before. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 11:38 AM, Carson Holt >>>>>>> wrote: >>>>>>> That is the CPAN version and the last stable release on bioperl.org >>>>>>> . Older version as well as the bio-perl live >>>>>>> version will cause MAKER to fail. The both have issues with the Fasta >>>>>>> indexing module that maker uses. >>>>>>> >>>>>>> http://search.cpan.org/CPAN/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.901.tar >>>>>>> .gz >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 11:34 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I'm not sure if a rebuild of Maker was necessary, but I tried running it >>>>>>> just to be safe. It's complaining about Bio::Root::Version dependency >>>>>>> not being met. Looking at the Build.PL file, it requires >>>>>>> Bio::Root::Version version 1.006901. Is there really such a version, or >>>>>>> should this be changed to 1.006 or 1.006001? >>>>>>> >>>>>>> For now I'll change it to 1.006001 (the installed version) and proceed >>>>>>> with another test. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 9:45 AM, Carson Holt wrote: >>>>>>> Could you run this command in the maker devel base directory. >>>>>>> >>>>>>> svn switch --relocate svn://* >>>>>>> ************ >>>>>>> svn://* *************** >>>>>>> >>>>>>> Then do 'svn update', and then tell me what happens. Make sure to >>>>>>> delete the and *.db files in the *.maker.output/ directory before >>>>>>> retrying. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 9:10 AM >>>>>>> To: Maker Mailing List >>>>>>> Subject: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> Greetings! >>>>>>> >>>>>>> I have some precomputed transcript and protein alignments that I would >>>>>>> like to use with Maker. I have converted them into GFF3 format (see >>>>>>> attached examples) and provided them to their corresponding entries >>>>>>> (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. >>>>>>> >>>>>>> Unfortunately, Maker seems to be getting caught up on processing these >>>>>>> GFF3 files. I've tried running Maker 2.10 as well as the development >>>>>>> version (checked out a few months ago--svn server isn't responding so I >>>>>>> can't give a precise revision number), and in both cases Maker hangs >>>>>>> while trying to create the GFF3 database. These are the last lines I see >>>>>>> in STDERR when --debug is set. >>>>>>> >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>>>>> line 587. >>>>>>> >>>>>>> I can't find any documentation specifying any explicit requirements for >>>>>>> the alignment-containing GFF3 input files. Maker output uses the pretty >>>>>>> canonical expressed_sequence_match, protein_match, and match_part >>>>>>> features for encoding alignments, and I have used this convention with >>>>>>> my input (see attached examples). I have also double-checked that my >>>>>>> examples are valid GFF3, so my guess is that Maker has additional >>>>>>> constraints/expectations for certain fields in the GFF3 files (score >>>>>>> column? required attributes?). Is this correct, and if so would you be >>>>>>> able to point me toward any related documentation I may have missed? >>>>>>> >>>>>>> Many thanks. >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Sun May 26 20:26:58 2013 From: rob.syme at gmail.com (Rob Syme) Date: Mon, 27 May 2013 10:26:58 +0800 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Message-ID: Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_datastore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Tue May 28 06:00:54 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Tue, 28 May 2013 13:00:54 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195ED54.4090501@ebi.ac.uk> References: <5195ED54.4090501@ebi.ac.uk> Message-ID: <51A49C76.3060801@ebi.ac.uk> Thanks Carson, 2.28 with -a command line flag fixed this problem. Uma On 17/05/13 09:41, Uma Maheswari wrote: > Hi Carson, > > I checked with Michael, this is different from what he saw, he had > entire segements of gff files duplicated, In this case, just Parent id > is. > I am preparing the files you asked for, will send them soon > > thanks > Uma > > > On 16/05/13 17:50, Carson Holt wrote: >> Yes. Perhaps this is the same issue Michael saw, although the one >> difference I see from his post is the Parent= attribute. >> >> --> >> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 >> >> I have seen duplicate exons from GFF3 pass-through in the past, but >> if that's not being used I'd be very appreciative of any test dataset >> you could give me. >> >> Thanks, >> Carson >> >> >> >> >> From: Daniel Hughes > >> Date: Thursday, 16 May, 2013 12:38 PM >> To: Carson Holt > >> Cc: Uma Maheswari >, >> "maker-devel at yandell-lab.org " >> > >> Subject: Re: [maker-devel] duplicate exons? >> >> hiya, are you using the same instance as michael at ebi as this >> sounds like the same problem he had last week and he wasn't running >> pass through. i've run 2.27 here 30+ times here and not seen this? is >> something very strange corrupted? >> >> dan. >> >> Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) >> ------------------------------------------------------------------------------------- >> dsth at cantab.net >> dsth at cpan.org >> >> >> 2013/5/16 Carson Holt > >> >> I think this also may be a result of using GFF3 pass-through. So >> if that >> is the case, could you send me any GFF3 files you gave maker in >> addition >> to the other files I asked for. >> >> Thanks, >> Carson >> >> >> >> On 13-05-16 12:08 PM, "Uma Maheswari" > > wrote: >> >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I >> found >> >that few hundreds of genes with 'duplicate exons' . When I looked >> in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue May 28 19:37:58 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 29 May 2013 01:37:58 +0000 Subject: [maker-devel] maker running error Message-ID: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue May 28 19:58:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Wed, 29 May 2013 01:58:51 +0000 Subject: [maker-devel] maker running error In-Reply-To: References: Message-ID: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:45:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:45:30 -0400 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? In-Reply-To: Message-ID: It's not an MPI requirement, just an execution error. I've attached a fixed version of that script. Really it is just a wrapper that runs maker with a few parameters changes. You can do the exact same thing by removing all repeat mask options, setting est2genome=1 and then an adding est_forward=1 to the maker_opts.ctl file. Thanks, Carson From: Rob Syme Date: Sunday, 26 May, 2013 10:26 PM To: Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_dat astore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map2assembly Type: application/octet-stream Size: 6413 bytes Desc: not available URL: From carsonhh at gmail.com Wed May 29 06:49:39 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:49:39 -0400 Subject: [maker-devel] maker running error In-Reply-To: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Message-ID: Yes, most likely an input fasta error. If that is not the case there are also some versions of BLAST that have version specific failures, and are fixed by upgrading blast. For example, I see you are using blastall which is from the older NCBI BLAST as apposed to the newer BLAST+. --Carson From: Mark Yandell Date: Tuesday, 28 May, 2013 9:58 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker running error Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: > Dear all, > > When I try to run maker on my datasets, there is an error like this: > > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > > #--------- command -------------# > Widget::blastn: > /usr/local/bin/blastall -p blastn -d > /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r > 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blas > tn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn > #-------------------------------# > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > ERROR: BLASTN failed > > FATAL ERROR > ERROR: Failed while doing blastn of ESTs!! > > ERROR: Chunk failed at level 8 > !! > FAILED CONTIG:C4345703 > > > Could anyone give me some suggestions? > > Thanks! > > Jingjing > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:54:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:54:30 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Yes. That's ok, but you would get better performance by installing MPI and using that. Alternatively just start maker several times in the same directory without splitting the input fasta. You can usually start about 10-15 concurrent maker processes safely, but would still get better performance with MPI. --Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Tuesday, 28 May, 2013 8:16 AM To: , Carson Holt Cc: Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > > > Thanks, > > Carson > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > > > Thank you! > > > > Best, > > > > Diana > > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> Ok. You just ran the evidence and didn't give a gene predictor. You need >> to provide an HMM file for SNAP a species for augustus, or for rough >> annotations you can set protein3genome=1 and est2genome=1. This will try and >> generate models direct from the alignments. >> >> >> >> If you provide a gene predictor, then MAKER can talk to it about the >> evidence alignments so it can make a best gene call for the region. Then >> there will be gene/mRNA/exon model in the GFF3 file and entires in the >> proteins.fasta and transcripts.fasta. If you need to train a predictor, you >> can train SNAP using the maker2zff script and the SNAP documentation or maker >> GMOD tutorial. If you want to train augustus Jason Stajich wrote an >> excellent explanation as well as tools in a previous list message. >> >> >> >> >> list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html >> >> Script is in this github repo - >> >> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2a >> ugustus_gbk.pl >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 1:41 PM >> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> >> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, >> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de> >> Subject: Re: [maker-devel] Maker consensus >> >> >> >> >> >> Hi Carson, >> >> >> >> Thank you for the quick answer. >> >> I ran gff3_merge to merge all the gff files and this resulted in a gff file, >> which has these type of fields: >> >> scaffold32239 blastx protein_match 22905 34500 174 + . >> ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML >> 1-2039; >> scaffold32239 blastx match_part 22905 23045 174 + . >> ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG0000 >> 0000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT0000 >> 0000219|DSCAML1-2039 172 218;Gap=M47; >> >> In comparison to the dpp_contig test file, I am missing est2genome evidence, >> most probably because my est data set is pretty poor. I have blastx and >> protein2genome evidence though. >> >> >> >> My goal is to extract the genes that could be annotated on the scaffolds. In >> the gff files the hits overlap most of the times, I can visualize this >> properly in apollo: for example one scaffold hits DSCAML gene in both >> zebrafinch and chicken, but extracting the coordinates between which this >> scaffold fits this annotated gene is difficult from the gff. Manually >> curating the genes is also not an option, since I am trying to do this for a >> 1.7Gb genome. >> >> >> >> I hope this explains better what we are after. >> >> >> >> Thank you once again. >> >> >> >> Best regards, >> >> >> >> Diana >> On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: >> >> >>> >>> I'm sorry I don?t' understand question 1. You are you missing resulting >>> fasta files, correct? Did your resulting GFF3 file have any features of >>> type "gene"? Did you run fasta_merge after running gff3_merge? >>> >>> >>> >>> Could you give me more details on what you are trying to do, so I can take >>> a stab at question 2 as well. >>> >>> >>> >>> Thanks, >>> >>> Carson >>> >>> >>> >>> >>> >>> >>> >>> From: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Date: Friday, 10 May, 2013 10:44 AM >>> To: < maker-devel at yandell-lab.org> >>> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >>> kelso at eva.mpg.de>, Torsten Schoeneberg < >>> torsten.schoeneberg at medizin.uni-leipzig.de> >>> Subject: [maker-devel] Maker consensus >>> >>> >>> >>> >>> >>> >>> >>> Dear maker developers, >>> >>> >>> I am a phD student working on de novo assembly and annotation of a bird >>> genome. I used Maker as annotation pipeline, which ran very well, and I >>> obtained different annotations with evidence from Augustus gene predictor, >>> small EST dataset from my organism and protein sequences from chicken, >>> turkey and zebrafinch. I could combine the different gff files from >>> different scaffolds into one gff file with annotations for the entire >>> genome. >>> >>> >>> I now have two questions: >>> >>> >>> 1. What could be the reason that I haven't gotten the protein.fasta and >>> trancript.fasta files >>> >>> >>> 2. How can I obtain a consensus gene list of different evidences from maker? >>> What I would actually need is the scaffold, coordinates and annotation (gene >>> name) according to the 3 other bird species. >>> Thank you in advance. >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Diana Le Duc >>> >>> >>> >>> -- >>> >>> Max Planck Institute for Evolutionary Anthropology >>> Department of Evolutionary Genetics >>> Deutscher Platz 6 >>> D-04103 Leipzig >>> >>> Phone +49 (0)341-3550-554 >>> www.eva.mpg.de >>> >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Tue May 28 06:16:40 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Tue, 28 May 2013 14:16:40 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > Thanks, > Carson > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > Thank you! > > Best, > > Diana > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > Ok. You just ran the evidence and didn't give a gene predictor. You > > > need to provide an HMM file for SNAP a species for augustus, or for > > > rough annotations you can set protein3genome=1 and est2genome=1. This > > > will try and generate models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the > > evidence alignments so it can make a best gene call for the region. Then > > there will be gene/mRNA/exon model in the GFF3 file and entires in the > > proteins.fasta and transcripts.fasta. If you need to train a predictor, you > > can train SNAP using the maker2zff script and the SNAP documentation or > > maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an > > excellent explanation as well as tools in a previous list message. > > > > list msg - > > http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > > > Script is in this github repo - > > > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 1:41 PM > > To: < maker-devel at yandell-lab.org >, > > Carson Holt < carsonhh at gmail.com > > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > > >, Gabriel Renaud < > > gabriel_renaud at eva.mpg.de >, Janet Kelso > > < kelso at eva.mpg.de > > > Subject: Re: [maker-devel] Maker consensus > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff > > file, which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > > scaffold32239 blastx match_part 22905 23045 174 + . > > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > > 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome > > evidence, most probably because my est data set is pretty poor. I have > > blastx and protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. > > In the gff files the hits overlap most of the times, I can visualize this > > properly in apollo: for example one scaffold hits DSCAML gene in both > > zebrafinch and chicken, but extracting the coordinates between which this > > scaffold fits this annotated gene is difficult from the gff. Manually > > curating the genes is also not an option, since I am trying to do this for a > > 1.7Gb genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > > wrote: > > > > > > > I'm sorry I don?t' understand question 1. You are you missing > > > > > resulting fasta files, correct? Did your resulting GFF3 file have > > > > > any features of type "gene"? Did you run fasta_merge after running > > > > > gff3_merge? > > > > > > Could you give me more details on what you are trying to do, so I can > > > take a stab at question 2 as well. > > > > > > Thanks, > > > Carson > > > > > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Date: Friday, 10 May, 2013 10:44 AM > > > To: < maker-devel at yandell-lab.org > > > > > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > > >, Janet Kelso < kelso at eva.mpg.de > > > >, Torsten Schoeneberg < > > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > > > Subject: [maker-devel] Maker consensus > > > > > > > > > Dear maker developers, > > > > > > I am a phD student working on de novo assembly and annotation of a bird > > > genome. I used Maker as annotation pipeline, which ran very well, and I > > > obtained different annotations with evidence from Augustus gene predictor, > > > small EST dataset from my organism and protein sequences from chicken, > > > turkey and zebrafinch. I could combine the different gff files from > > > different scaffolds into one gff file with annotations for the entire > > > genome. > > > > > > I now have two questions: > > > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > > trancript.fasta files > > > > > > 2. How can I obtain a consensus gene list of different evidences from > > > maker? What I would actually need is the scaffold, coordinates and > > > annotation (gene name) according to the 3 other bird species. > > > > > > Thank you in advance. > > > > > > Best regards, > > > > > > Diana Le Duc > > > > > > -- > > > > > > Max Planck Institute for Evolutionary Anthropology > > > Department of Evolutionary Genetics > > > Deutscher Platz 6 > > > D-04103 Leipzig > > > > > > Phone +49 (0)341-3550-554 > > > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing > > > list maker-devel at box290.bluehost.com > > > > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaganjot.kaur at sickkids.ca Wed May 29 13:34:19 2013 From: gaganjot.kaur at sickkids.ca (Gaganjot Kaur) Date: Wed, 29 May 2013 19:34:19 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Message-ID: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 29 21:10:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 30 May 2013 03:10:52 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs In-Reply-To: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Message-ID: This is a parsing error coming from BioPerl. Could you run maker with the --debug flag. Redirect the STDERR to a file. You can kill it after a few seconds, I really just want to see the version for your BioPerl installation. Also what version of BLAST are you running. --Carson From: Gaganjot Kaur > Date: Wednesday, 29 May, 2013 3:34 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Wed May 1 05:38:52 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Wed, 01 May 2013 12:38:52 +0100 Subject: [maker-devel] substr outside of string Message-ID: <5180FECC.2020308@ebi.ac.uk> Hello! I have run maker with est and rna seq data to create a training set for SNAP. Then I trained SNAP and added the hmm to the snaphmm option and reran maker. Maker is giving me error messages like this: " setting up GFF3 output and fasta chunks doing repeat masking re reading repeat masker report. substr outside of string at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 . --> rank=NA, hostname=ebi-209.ebi.ac.uk " The line from which this error message originates is: substr($$seq, $b -1 , $l, "$replace"x$l); After getting these error messages I replaced it with eval { substr($$seq, $b -1 , $l, "$replace"x$l); }; if ($@) { use Carp; use Data::Dumper; confess( $@ . "\n\n" . Dumper($p) . "\n\n" . "Length of sequence: " . (length $$seq) ); } After that I got this: $VAR1 = [ 98926, 99033 ]; Length of sequence: 98686 at /maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 5 I have not changed the genome file. I'm also concerned with the reported length of 98686, because I have a list of all sequences in the file and their lengths, and none of them has a length of 98686 bp. The sequences with the closest lengths are these: 98367 LSalAtl2s1200 98438 LSalAtl2s1473 98776 LSalAtl2s1613 98876 LSalAtl2s1199 so they are not even close. $$seq is a sequence as a string, when I print it. Sometimes maker prints a message like this: " --Next Contig-- Processing run.log file... #--------------------------------------------------------------------- Now retrying the contig!! SeqID: LSalAtl2s63 Length: 3997709 Tries: 5!! #--------------------------------------------------------------------- " But according to my list, which I generated from the exact same file that maker has in genome_file option, the length of that sequence is 1169407. Any idea, why I am getting these problems and what to do about them? Cheers, Michael. From carsonhh at gmail.com Wed May 1 07:17:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 01 May 2013 09:17:50 -0400 Subject: [maker-devel] substr outside of string In-Reply-To: <5180FECC.2020308@ebi.ac.uk> Message-ID: The length you are printing is not the length of the contig, but rather the length of the piece of the contig MAKER is working with at that moment. The fact that the length is not exactly 100000 is telling me that this is a piece at the end of the contig. By any chance are you using GFF3 pass-through of repeat elements? If not there may be a repeatmasker parsing bug as the start and end coordinate are off the edge of the contig. If you run maker on the command line (not vie MPI), what is the repeatmasker report read immediately before the error. Could you then attach it and the fasta sequence for the contig that fails. Thanks, Carson On 13-05-01 7:38 AM, "Michael Nuhn" wrote: >Hello! > >I have run maker with est and rna seq data to create a training set for >SNAP. Then I trained SNAP and added the hmm to the snaphmm option and >reran maker. > >Maker is giving me error messages like this: > >" >setting up GFF3 output and fasta chunks >doing repeat masking >re reading repeat masker report. > >substr outside of string at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 140 >. >--> rank=NA, hostname=ebi-209.ebi.ac.uk >" > >The line from which this error message originates is: > > substr($$seq, $b -1 , $l, "$replace"x$l); > >After getting these error messages I replaced it with > > eval { > substr($$seq, $b -1 , $l, "$replace"x$l); > }; > if ($@) { > use Carp; > use Data::Dumper; > confess( > $@ > . "\n\n" > . Dumper($p) > . "\n\n" > . "Length of sequence: " . (length $$seq) > ); > } > >After that I got this: > >$VAR1 = [ > 98926, > 99033 > ]; > > >Length of sequence: 98686 at maker>/maker/2.27/maker/bin/../lib/repeat_mask_seq.pm line 14 >5 > >I have not changed the genome file. > >I'm also concerned with the reported length of 98686, because I have a >list of all sequences in the file and their lengths, and none of them >has a length of 98686 bp. The sequences with the closest lengths are >these: > >98367 LSalAtl2s1200 >98438 LSalAtl2s1473 >98776 LSalAtl2s1613 >98876 LSalAtl2s1199 > >so they are not even close. > >$$seq is a sequence as a string, when I print it. > >Sometimes maker prints a message like this: > >" >--Next Contig-- > >Processing run.log file... >#--------------------------------------------------------------------- >Now retrying the contig!! >SeqID: LSalAtl2s63 >Length: 3997709 >Tries: 5!! >#--------------------------------------------------------------------- >" > >But according to my list, which I generated from the exact same file >that maker has in genome_file option, the length of that sequence is >1169407. > >Any idea, why I am getting these problems and what to do about them? > >Cheers, >Michael. > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From ejr at stowers.org Wed May 1 09:57:11 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 15:57:11 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Wed May 1 17:42:47 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Wed, 1 May 2013 17:42:47 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > Should this be accessible anonymously? > > I'm unable to connect. > > Eric > > > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM > To: Barry Moore > Cc: Eric Ross , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Barry - I think you mean topaz instead of malachite? > > svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA > > > Jason Stajich > jason at bioperl.org > jason.stajich at gmail.com > http://bioperl.org/wiki/User:Jason > http://twitter.com/hyphaltip > > > On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >> Hi Eric, >> >> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >> >> http://www.sequenceontology.org/resources/sobacl.html >> >> Ultimately you'll get it like this: >> >> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >> >> Then run: >> >> SOBA/bin/SOBAcl --help >> >> For a lot of command line examples have a look in: >> >> SOBA/t/sobacl_test.sh >> >> B >> >> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >> >>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>> gff files? >>> >>> SOBA can give some basic stats, but it doesn't play well with my giant >>> files and I haven't figured out a way to run it locally. >>> >>> For that matter does anyone have a script that will calculate SOBA like >>> stats locally? I'd rather avoid writing one myself if something else is >>> out there. >>> >>> Thanks, >>> >>> Eric >>> >>> -- >>> Eric Ross >>> Bioinformatic Specialist I >>> Alejandro S?nchez Alvarado Laboratory >>> Stowers Institute for Medical Research >>> Howard Hughes Medical Institute >>> ejr at stowers.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ejr at stowers.org Wed May 1 17:53:08 2013 From: ejr at stowers.org (Ross, Eric) Date: Wed, 1 May 2013 23:53:08 +0000 Subject: [maker-devel] repeat statistics In-Reply-To: Message-ID: Works now. Thanks much, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM To: Eric Ross > Cc: Jason Stajich >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Eric, Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. B On May 1, 2013, at 9:57 AM, Ross, Eric wrote: Should this be accessible anonymously? I'm unable to connect. Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org From: Jason Stajich > Date: Monday, April 29, 2013 5:49 PM To: Barry Moore > Cc: Eric Ross >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics Barry - I think you mean topaz instead of malachite? svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason http://twitter.com/hyphaltip On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore > wrote: Hi Eric, There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: http://www.sequenceontology.org/resources/sobacl.html Ultimately you'll get it like this: svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA Then run: SOBA/bin/SOBAcl --help For a lot of command line examples have a look in: SOBA/t/sobacl_test.sh B On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: Does anyone have a good tool for yanking repeat statistics out of MAKER gff files? SOBA can give some basic stats, but it doesn't play well with my giant files and I haven't figured out a way to run it locally. For that matter does anyone have a script that will calculate SOBA like stats locally? I'd rather avoid writing one myself if something else is out there. Thanks, Eric -- Eric Ross Bioinformatic Specialist I Alejandro S?nchez Alvarado Laboratory Stowers Institute for Medical Research Howard Hughes Medical Institute ejr at stowers.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guoyunfei1989 at gmail.com Fri May 3 10:33:42 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 3 May 2013 09:33:42 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped Message-ID: Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Fri May 3 16:51:27 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 3 May 2013 16:51:27 -0600 Subject: [maker-devel] repeat statistics In-Reply-To: References: Message-ID: <37BA6893-1175-4F3E-B3AA-6C1E23C4364E@genetics.utah.edu> Let me know how it works out for you - feedback either positive or negative is useful. B On May 1, 2013, at 5:53 PM, Ross, Eric wrote: > Works now. > > Thanks much, > > Eric > -- > Eric Ross > Bioinformatic Specialist I > Alejandro S?nchez Alvarado Laboratory > Stowers Institute for Medical Research > Howard Hughes Medical Institute > ejr at stowers.org > > From: Barry Moore > Date: Wednesday, May 1, 2013 6:42 PM > To: Eric Ross > Cc: Jason Stajich , "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] repeat statistics > > Eric, > > Try again, it should have been world readable before, but I've opened it a bit wider now, so should definitely be now. Let me know if you have problems. > > B > > On May 1, 2013, at 9:57 AM, Ross, Eric wrote: > >> Should this be accessible anonymously? >> >> I'm unable to connect. >> >> Eric >> >> >> -- >> Eric Ross >> Bioinformatic Specialist I >> Alejandro S?nchez Alvarado Laboratory >> Stowers Institute for Medical Research >> Howard Hughes Medical Institute >> ejr at stowers.org >> >> From: Jason Stajich >> Date: Monday, April 29, 2013 5:49 PM >> To: Barry Moore >> Cc: Eric Ross , "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] repeat statistics >> >> Barry - I think you mean topaz instead of malachite? >> >> svn co svn://topaz.genetics.utah.edu/SOBA/trunk SOBA >> >> >> Jason Stajich >> jason at bioperl.org >> jason.stajich at gmail.com >> http://bioperl.org/wiki/User:Jason >> http://twitter.com/hyphaltip >> >> >> On Mon, Apr 29, 2013 at 10:59 AM, Barry Moore wrote: >>> Hi Eric, >>> >>> There is a command line version of SOBA. It does the same things as the web version and much more. This page has some basic details: >>> >>> http://www.sequenceontology.org/resources/sobacl.html >>> >>> Ultimately you'll get it like this: >>> >>> svn co svn://malachite.genetics.utah.edu/SOBA/trunk SOBA >>> >>> Then run: >>> >>> SOBA/bin/SOBAcl --help >>> >>> For a lot of command line examples have a look in: >>> >>> SOBA/t/sobacl_test.sh >>> >>> B >>> >>> On Apr 29, 2013, at 9:58 AM, Ross, Eric wrote: >>> >>>> Does anyone have a good tool for yanking repeat statistics out of MAKER >>>> gff files? >>>> >>>> SOBA can give some basic stats, but it doesn't play well with my giant >>>> files and I haven't figured out a way to run it locally. >>>> >>>> For that matter does anyone have a script that will calculate SOBA like >>>> stats locally? I'd rather avoid writing one myself if something else is >>>> out there. >>>> >>>> Thanks, >>>> >>>> Eric >>>> >>>> -- >>>> Eric Ross >>>> Bioinformatic Specialist I >>>> Alejandro S?nchez Alvarado Laboratory >>>> Stowers Institute for Medical Research >>>> Howard Hughes Medical Institute >>>> ejr at stowers.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> Barry Moore >>> Research Scientist >>> Dept. of Human Genetics >>> University of Utah >>> Salt Lake City, UT 84112 >>> -------------------------------------------- >>> (801) 585-3543 >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmdoyle at purdue.edu Sun May 5 05:55:47 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Sun, 5 May 2013 07:55:47 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1109250054.216072.1367754420354.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Hi! I've recently attempted to install MAKER (Mac OS X). ?I installed blast and exonerate using the ./Build blast and ./Build exonerate commands, and I manually installed repeatmasker, snap and augustus (I couldn't get the ./Build commands to work). ?I then attempted to test out maker following the 2012 MAKER tutorial. ?I received the blastx error message pasted below, and there is additional information in the maker log I've attached to this email. ?I was wondering if anyone had any suggestions about debugging, as I'm not quite sure where to begin... Best wishes and thanks, Jackie #--------- command -------------# Widget::formater: /usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in /tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 #-------------------------------# dyld: lazy symbol binding failed: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace dyld: Symbol not found: __ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_i ??Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb ??Expected in: flat namespace ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in Widget::formater FATAL ERROR ERROR: Failed while doing blastx repeats!! ERROR: Chunk failed at level 3 !! FAILED CONTIG:contig-dpp-500-500 Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: Build status.odt Type: application/vnd.oasis.opendocument.text Size: 2740 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.odt Type: application/vnd.oasis.opendocument.text Size: 2772 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.odt Type: application/vnd.oasis.opendocument.text Size: 3479 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.odt Type: application/vnd.oasis.opendocument.text Size: 2821 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker log.odt Type: application/vnd.oasis.opendocument.text Size: 3340 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 6 06:32:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 08:32:52 -0400 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: Message-ID: You would have to send me the captured STDERR. MAKER will print out a number of messages whenever it restarts a contig, and will explain why it deletes any files before restarting. Thanks, Carson From: Yunfei Guo Date: Friday, 3 May, 2013 12:33 PM To: Subject: [maker-devel] maker doesn't pick up where it stopped Dear MAKER community, I got a problem that maker doesn't pick up where it stopped last time, rather, it will discard all previous results. command: echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 maker version: 2.26 mpich version: 1.5rc3 in maker_opts: clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no It never happened before. Any advice? Thank you! Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 6 08:02:52 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 06 May 2013 10:02:52 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <261748058.216082.1367754947403.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Most maker development and debugging actually happens on a Mac (OS X 10.7.5). Blast, Augustus, SNAP all install for me just fine with maker 2.27. What errors do you get during installation? Do you by any chance have non-standard libraries via Mac ports for example. Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). I then attempted to test out maker following >the 2012 MAKER tutorial. I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From guoyunfei1989 at gmail.com Mon May 6 09:33:57 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Mon, 6 May 2013 08:33:57 -0700 Subject: [maker-devel] maker doesn't pick up where it stopped In-Reply-To: References: Message-ID: Hi Carson, I used quitet mode, here's stderr (I only show one 'now starting the contig' msg). When I check maker master log upon restart by 'grep -ic finished master_log', all 'finished' tags were gone. A data structure will be created for you at: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _datastore To access files for individual sequences use the datastore index: /home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k _master_datastore_index.log #--------------------------------------------------------------------- Now starting the contig!! SeqID: scaffold105 Length: 8761 #--------------------------------------------------------------------- ... MAKER WARNING: The file GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/C8/27/scaffold5690//theVoid.scaffold5690/scaffold5690.0.HumanUCSCProteins%2Efasta.blastx did not finish on the last run and must be erased ... ERROR: Could not open '/home/yunfeiguo/projects/fish/scaffold/makerrun_2013_04_29/GapCloser-Nigro-Min1k.maker.output/GapCloser-Nigro-Min1k_datastore/A4/F7/scaffold6034//theVoid.scaffold6034/scaffold6034.0.Srub%2Elib.specific.out' ERROR: Failed while doing repeat masking ERROR: Chunk failed at level:0, tier_type:1 FAILED CONTIG:scaffold6034 ERROR: Chunk failed at level:2, tier_type:0 FAILED CONTIG:scaffold6034 ... On Mon, May 6, 2013 at 5:32 AM, Carson Holt wrote: > You would have to send me the captured STDERR. MAKER will print out a > number of messages whenever it restarts a contig, and will explain why it > deletes any files before restarting. > > Thanks, > Carson > > > From: Yunfei Guo > Date: Friday, 3 May, 2013 12:33 PM > To: > Subject: [maker-devel] maker doesn't pick up where it stopped > > Dear MAKER community, > > I got a problem that maker doesn't pick up where it stopped last time, > rather, it will discard all previous results. > > command: > echo 'mpiexec -n 12 maker -q' | qsub -V -cwd -l h_vmem=2g -pe mpich 12 > maker version: > 2.26 > mpich version: > 1.5rc3 > in maker_opts: > clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 > = no > clean_up=0 #removes theVoid directory with individual analysis files, 1 = > yes, 0 = no > > It never happened before. Any advice? > > Thank you! > > Yunfei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Mon May 6 20:22:23 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 7 May 2013 02:22:23 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> Message-ID: Repeats can still happen in genes. So an outright block actually causes more errors than it avoids, and a mixed approach of hard and soft masking becomes more appropriate. The masking step stops alignments from seeding in repeat regions, but if alignments seed in non-repeat regions then they can still extend through repeat regions during polishing steps (I.e. The EST evidence supports extension through the repeat and inclusion of the TE). --Carson From: Dario Copetti > Organization: AGI Date: Monday, 6 May, 2013 5:19 PM To: > Cc: "kapeel at cals.arizona.edu" >, "Stein, Joshua" >, Rod Wing > Subject: gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcopetti at cals.arizona.edu Mon May 6 15:19:42 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Mon, 06 May 2013 14:19:42 -0700 Subject: [maker-devel] gene models overlapping with TEs Message-ID: <51881E6E.9010202@cals.arizona.edu> Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gene_TE.jpg Type: image/jpeg Size: 177299 bytes Desc: not available URL: From myandell at genetics.utah.edu Mon May 6 21:47:49 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:47:49 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02CEE@mxb2.hg.genetics.utah.edu> could the TEs be in the UTRs? Also, maybe some of these are low complexity regions? Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From myandell at genetics.utah.edu Mon May 6 21:49:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 7 May 2013 03:49:51 +0000 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51881E6E.9010202@cals.arizona.edu> References: <51881E6E.9010202@cals.arizona.edu> Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> humm, eballing then it doesn't look lie its the UTRss.. Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel-bounces at yandell-lab.org [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti [dcopetti at cals.arizona.edu] Sent: Monday, May 06, 2013 3:19 PM To: maker-devel at yandell-lab.org Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu Subject: [maker-devel] gene models overlapping with TEs Carson, Analyzing the output of a MAKER run on a rice-sized genome I noticed that some gene models (~10%) overlap with TE coding regions. As a QC step, I used BEDtools to determine the intersection of "CDS" and "repeatmasker" or "repeatrunner" and some 2400 genes overlap for at least 30% of their respective length. I am wondering how the gene models still appear in the final output, since I thought that the masking step was giving us the absoulte confirmation that in our endogenous gene list we do not include TE coding regions. Here below an example of a gene (attached picture too): ObracChr10 maker mRNA 355,056 358,075 . - . ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eAED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 ObracChr10 maker exon 355,056 356,874 . - . ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 ObracChr10 maker exon 356,965 357,081 . - . ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,209 357,319 . - . ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 ObracChr10 maker exon 357,756 358,075 . - . ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,756 358,075 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 357,209 357,319 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 356,965 357,081 . - 2 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 maker CDS 355,056 356,874 . - 0 ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 ObracChr10 repeatrunner match_part 357,755 358,084 566 - . ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner protein_match 357,755 358,084 566 - . ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 ObracChr10 repeatrunner match_part 357,202 357,294 142 - . ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner protein_match 357,202 357,294 142 - . ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 ObracChr10 repeatrunner match_part 355,059 357,092 3367 - . ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - . ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetical;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 This result is valid both for output lines from repeatmasker or repeatrunner, and the gene models come from either FGENESH or SNAP predictions. How can I explain this problem? Thanks, Dario -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From carsonhh at gmail.com Tue May 7 05:39:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 07:39:17 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <7A60AB257EFF2B48B1F4C814817EA05365E02D13@mxb2.hg.genetics.utah.edu> Message-ID: If I had to guess. I imagine the EST evidence includes assembled mRNA-seq reads? Is that correct? --Carson On 13-05-06 11:49 PM, "Mark Yandell" wrote: >humm, eballing then it doesn't look lie its the UTRss.. > >Mark Yandell >Professor of Human Genetics >H.A. & Edna Benning Presidential Endowed Chair >Eccles Institute of Human Genetics >University of Utah >15 North 2030 East, Room 2100 >Salt Lake City, UT 84112-5330 >ph:801-587-7707 > >________________________________________ >From: maker-devel-bounces at yandell-lab.org >[maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >[dcopetti at cals.arizona.edu] >Sent: Monday, May 06, 2013 3:19 PM >To: maker-devel at yandell-lab.org >Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >Subject: [maker-devel] gene models overlapping with TEs > >Carson, > >Analyzing the output of a MAKER run on a rice-sized genome I noticed that >some gene models (~10%) overlap with TE coding regions. As a QC step, I >used BEDtools to determine the intersection of "CDS" and "repeatmasker" >or "repeatrunner" and some 2400 genes overlap for at least 30% of their >respective length. I am wondering how the gene models still appear in the >final output, since I thought that the masking step was giving us the >absoulte confirmation that in our endogenous gene list we do not include >TE coding regions. Here below an example of a gene (attached picture too): > >ObracChr10 maker mRNA 355,056 358,075 . - . >ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >ObracChr10 maker exon 355,056 356,874 . - . >ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >ObracChr10 maker exon 356,965 357,081 . - . >ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,209 357,319 . - . >ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >ObracChr10 maker exon 357,756 358,075 . - . >ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,756 358,075 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 357,209 357,319 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 356,965 357,081 . - 2 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >ObracChr10 maker CDS 355,056 356,874 . - 0 >ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 > > > > > > > > > > > > > > > > > > > > >ObracChr10 repeatrunner match_part 357,755 358,084 566 - > . >ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner protein_match 357,755 358,084 566 - > . >ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >ObracChr10 repeatrunner match_part 357,202 357,294 142 - > . >ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner protein_match 357,202 357,294 142 - > . >ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >ObracChr10 repeatrunner match_part 355,059 357,092 3367 - > . >ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - > . >ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 > > >This result is valid both for output lines from repeatmasker or >repeatrunner, and the gene models come from either FGENESH or SNAP >predictions. >How can I explain this problem? >Thanks, > >Dario > > > > > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From jmdoyle at purdue.edu Tue May 7 09:12:38 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 11:12:38 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, Thanks for the quick reply! ?I don't remember any errors during Blast installation, it appeared to install fine with the ./Build command. ?Augustus, Repeatmasker and SNAP were the programs I could not install with the ./Build commands, and instead installed manually. ?I've attached the error messages I received when I tried to use the ./Build commands. ?I've tested out the three programs I installed manually and they seem to work fine on their own. I do have xcode installed. ?How would I determine if I have "non-standard libraries via Mac ports"? Thanks again for your help with this. Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: "Carson Holt" To: "Jacqueline R M Doyle" , maker-devel at yandell-lab.org Sent: Monday, May 6, 2013 10:02:52 AM Subject: Re: [maker-devel] MAKER installation debugging Most maker development and debugging actually happens on a Mac (OS X 10.7.5). ?Blast, Augustus, SNAP all install for me just fine with maker 2.27. ?What errors do you get during installation? ?Do you by any chance have non-standard libraries via Mac ports for example. ?Do you have xcode installed (it provides the appropriate 'make' command for compiling C)? Thanks, Carson On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: >Hi! > >I've recently attempted to install MAKER (Mac OS X). ?I installed blast >and exonerate using the ./Build blast and ./Build exonerate commands, and >I manually installed repeatmasker, snap and augustus (I couldn't get the >./Build commands to work). ?I then attempted to test out maker following >the 2012 MAKER tutorial. ?I received the blastx error message pasted >below, and there is additional information in the maker log I've attached >to this email. ?I was wondering if anyone had any suggestions about >debugging, as I'm not quite sure where to begin... > >Best wishes and thanks, Jackie > > >#--------- command -------------# >Widget::formater: >/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >#-------------------------------# >dyld: lazy symbol binding failed: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >dyld: Symbol not found: >__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PK >S3_i > ?Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb > ?Expected in: flat namespace > >ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >Widget::formater > >FATAL ERROR >ERROR: Failed while doing blastx repeats!! > >ERROR: Chunk failed at level 3 >!! >FAILED CONTIG:contig-dpp-500-500 > > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- A non-text attachment was scrubbed... Name: repeatmasker installation error.rtf Type: application/rtf Size: 1264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: snap installation error.rtf Type: application/rtf Size: 1095 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: agustus installation error.rtf Type: application/rtf Size: 1124 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 7 09:19:57 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 11:19:57 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1393522124.220153.1367939558646.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From jmdoyle at purdue.edu Tue May 7 10:54:22 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Tue, 7 May 2013 12:54:22 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <963584633.220449.1367945662870.JavaMail.root@mailhub042.itcs.purdue.edu> Hi Carson, I am using MAKER 2.10 (I downloaded it so long ago I'd forgotten there were two options). Would it be better for me to start over with 2.27? Best wishes, Jackie Department of Forestry and Natural Resources Purdue University West Lafayette, IN 47907 Phone: 270-293-9486 E-mail: jmdoyle at purdue.edu ----- Original Message ----- From: Carson Holt To: Jacqueline R M Doyle Cc: maker-devel at yandell-lab.org Sent: Tue, 07 May 2013 11:19:57 -0400 (EDT) Subject: Re: [maker-devel] MAKER installation debugging Which version of MAKER are you using. Is it 2.10 or 2.27? Thanks, Carson On 13-05-07 11:12 AM, "Jacqueline R M Doyle" wrote: >Hi Carson, > >Thanks for the quick reply! I don't remember any errors during Blast >installation, it appeared to install fine with the ./Build command. >Augustus, Repeatmasker and SNAP were the programs I could not install >with the ./Build commands, and instead installed manually. I've attached >the error messages I received when I tried to use the ./Build commands. >I've tested out the three programs I installed manually and they seem to >work fine on their own. > >I do have xcode installed. How would I determine if I have "non-standard >libraries via Mac ports"? > >Thanks again for your help with this. > >Best wishes, Jackie > >Department of Forestry and Natural Resources >Purdue University >West Lafayette, IN 47907 >Phone: 270-293-9486 >E-mail: jmdoyle at purdue.edu > >----- Original Message ----- >From: "Carson Holt" >To: "Jacqueline R M Doyle" , >maker-devel at yandell-lab.org >Sent: Monday, May 6, 2013 10:02:52 AM >Subject: Re: [maker-devel] MAKER installation debugging > >Most maker development and debugging actually happens on a Mac (OS X >10.7.5). Blast, Augustus, SNAP all install for me just fine with maker >2.27. What errors do you get during installation? Do you by any chance >have non-standard libraries via Mac ports for example. Do you have xcode >installed (it provides the appropriate 'make' command for compiling C)? > >Thanks, >Carson > > >On 13-05-05 7:55 AM, "Jacqueline R M Doyle" wrote: > >>Hi! >> >>I've recently attempted to install MAKER (Mac OS X). I installed blast >>and exonerate using the ./Build blast and ./Build exonerate commands, and >>I manually installed repeatmasker, snap and augustus (I couldn't get the >>./Build commands to work). I then attempted to test out maker following >>the 2012 MAKER tutorial. I received the blastx error message pasted >>below, and there is additional information in the maker log I've attached >>to this email. I was wondering if anyone had any suggestions about >>debugging, as I'm not quite sure where to begin... >> >>Best wishes and thanks, Jackie >> >> >>#--------- command -------------# >>Widget::formater: >>/usr/local/maker/bin/../exe/blast/bin/makeblastdb -dbtype prot -in >>/tmp/maker_0GBY28/te_proteins%2Efasta.mpi.10.0 >>#-------------------------------# >>dyld: lazy symbol binding failed: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>dyld: Symbol not found: >>__ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_P >>K >>S3_i >> Referenced from: /usr/local/maker/bin/../exe/blast/bin/makeblastdb >> Expected in: flat namespace >> >>ERROR: /usr/local/maker/bin/../exe/blast/bin/makeblastdb failed in >>Widget::formater >> >>FATAL ERROR >>ERROR: Failed while doing blastx repeats!! >> >>ERROR: Chunk failed at level 3 >>!! >>FAILED CONTIG:contig-dpp-500-500 >> >> >>Department of Forestry and Natural Resources >>Purdue University >>West Lafayette, IN 47907 >>Phone: 270-293-9486 >>E-mail: jmdoyle at purdue.edu >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Tue May 7 12:20:19 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 07 May 2013 14:20:19 -0400 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: <51892ABA.2060100@cals.arizona.edu> Message-ID: This is really more of an evidence issue. Because you have assembled mRNAseq evidence, you are probably getting them improperly included in the assembled EST, so MAKER just follows the evidence. It tries to mask it out, but the alignment of the longer EST heavily supports the repeats inclusion in the model during alignment polishing. Solutions: 1. You can set softmask=0 instead of softmask=1 (1 is the default), to make everything hard masked instead (it will be a hard 'N' so no alignment can happen). 2. You can pre-mask the genome. Easiest way to do this would be to collect the query.masked.fasta files inside each theVoid directory in the datastore and use them as the input. Then none of the polishing steps can ever extend the alignment. 3. You can filter the mRNA-seq data fro TE elements before assembly. Thanks, Carson On 13-05-07 12:24 PM, "Dario Copetti" wrote: >Yes, there was RNA-seq evidence as well. Still I would like to have this >evidence annotated as TE, and not as a gene (or at least to have it >tagged in some way). > >As you suggested, a good solution could be to sequentially soft mask >with the RMasker output and then hard mask with the RRunner result. In >this way we hide TE coding regions from all predictors and alignments, >leaving all the other types of repeats softmasked. This meets Mark's >target of having MITEs and other non-autonomous TEs (as well as >simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my >opinion, this case could be one of the few cases (or the only one?) >where gene and repeat annotation can overlap. > >For our genomes I will have a list of these genes overlapping TE coding >regions, and we will likely remove them. Please let us know how you >intend to fix this problem and on which MAKER version it will appear. >Thanks for the assistance and suggestions, > >Dario > > > >On 05/07/2013 04:39 AM, Carson Holt wrote: >> If I had to guess. I imagine the EST evidence includes assembled >>mRNA-seq >> reads? Is that correct? >> >> --Carson >> >> >> >> On 13-05-06 11:49 PM, "Mark Yandell" wrote: >> >>> humm, eballing then it doesn't look lie its the UTRss.. >>> >>> Mark Yandell >>> Professor of Human Genetics >>> H.A. & Edna Benning Presidential Endowed Chair >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> ph:801-587-7707 >>> >>> ________________________________________ >>> From: maker-devel-bounces at yandell-lab.org >>> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >>> [dcopetti at cals.arizona.edu] >>> Sent: Monday, May 06, 2013 3:19 PM >>> To: maker-devel at yandell-lab.org >>> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >>> Subject: [maker-devel] gene models overlapping with TEs >>> >>> Carson, >>> >>> Analyzing the output of a MAKER run on a rice-sized genome I noticed >>>that >>> some gene models (~10%) overlap with TE coding regions. As a QC step, I >>> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >>> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >>> respective length. I am wondering how the gene models still appear in >>>the >>> final output, since I thought that the masking step was giving us the >>> absoulte confirmation that in our endogenous gene list we do not >>>include >>> TE coding regions. Here below an example of a gene (attached picture >>>too): >>> >>> ObracChr10 maker mRNA 355,056 358,075 . - . >>> >>>ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_ >>>eA >>> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >>> ObracChr10 maker exon 355,056 356,874 . - . >>> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 356,965 357,081 . - . >>> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,209 357,319 . - . >>> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >>> ObracChr10 maker exon 357,756 358,075 . - . >>> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,756 358,075 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 357,209 357,319 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 356,965 357,081 . - 2 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> ObracChr10 maker CDS 355,056 356,874 . - 0 >>> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ObracChr10 repeatrunner match_part 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner protein_match 357,755 358,084 566 >>> - >>> . >>> >>>ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >>> ObracChr10 repeatrunner match_part 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner protein_match 357,202 357,294 142 >>> - >>> . >>> >>>ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >>> ObracChr10 repeatrunner match_part 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM >>>_g >>> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 >>> - >>> . >>> >>>ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothet >>>ic >>> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >>> >>> >>> This result is valid both for output lines from repeatmasker or >>> repeatrunner, and the gene models come from either FGENESH or SNAP >>> predictions. >>> How can I explain this problem? >>> Thanks, >>> >>> Dario >>> >>> >>> >>> >>> >>> -- >>> Dario Copetti, PhD >>> Research Associate >>> Arizona Genomics Institute >>> University of Arizona - BIO5 >>> >>> 1657 E. Helen St. >>> Tucson, AZ 85721 >>> www.genome.arizona.edu >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > >-- >Dario Copetti, PhD >Research Associate >Arizona Genomics Institute >University of Arizona - BIO5 > >1657 E. Helen St. >Tucson, AZ 85721 >www.genome.arizona.edu > From dcopetti at cals.arizona.edu Tue May 7 10:24:26 2013 From: dcopetti at cals.arizona.edu (Dario Copetti) Date: Tue, 07 May 2013 09:24:26 -0700 Subject: [maker-devel] gene models overlapping with TEs In-Reply-To: References: Message-ID: <51892ABA.2060100@cals.arizona.edu> Yes, there was RNA-seq evidence as well. Still I would like to have this evidence annotated as TE, and not as a gene (or at least to have it tagged in some way). As you suggested, a good solution could be to sequentially soft mask with the RMasker output and then hard mask with the RRunner result. In this way we hide TE coding regions from all predictors and alignments, leaving all the other types of repeats softmasked. This meets Mark's target of having MITEs and other non-autonomous TEs (as well as simple/low compl. repeats) annotated in UTRs or CDSs, if present. In my opinion, this case could be one of the few cases (or the only one?) where gene and repeat annotation can overlap. For our genomes I will have a list of these genes overlapping TE coding regions, and we will likely remove them. Please let us know how you intend to fix this problem and on which MAKER version it will appear. Thanks for the assistance and suggestions, Dario On 05/07/2013 04:39 AM, Carson Holt wrote: > If I had to guess. I imagine the EST evidence includes assembled mRNA-seq > reads? Is that correct? > > --Carson > > > > On 13-05-06 11:49 PM, "Mark Yandell" wrote: > >> humm, eballing then it doesn't look lie its the UTRss.. >> >> Mark Yandell >> Professor of Human Genetics >> H.A. & Edna Benning Presidential Endowed Chair >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> ph:801-587-7707 >> >> ________________________________________ >> From: maker-devel-bounces at yandell-lab.org >> [maker-devel-bounces at yandell-lab.org] on behalf of Dario Copetti >> [dcopetti at cals.arizona.edu] >> Sent: Monday, May 06, 2013 3:19 PM >> To: maker-devel at yandell-lab.org >> Cc: Stein, Joshua; Rod Wing; kapeel at cals.arizona.edu >> Subject: [maker-devel] gene models overlapping with TEs >> >> Carson, >> >> Analyzing the output of a MAKER run on a rice-sized genome I noticed that >> some gene models (~10%) overlap with TE coding regions. As a QC step, I >> used BEDtools to determine the intersection of "CDS" and "repeatmasker" >> or "repeatrunner" and some 2400 genes overlap for at least 30% of their >> respective length. I am wondering how the gene models still appear in the >> final output, since I thought that the masking step was giving us the >> absoulte confirmation that in our endogenous gene list we do not include >> TE coding regions. Here below an example of a gene (attached picture too): >> >> ObracChr10 maker mRNA 355,056 358,075 . - . >> ID=Obrac10g00240.1;Parent=Obrac10g00240;Name=Obrac10g00240.1;_AED=0.24;_eA >> ED=0.24;_QI=0|0.66|0.5|1|1|1|4|0|788 >> ObracChr10 maker exon 355,056 356,874 . - . >> ID=Obrac10g00240.1:exon:4;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 356,965 357,081 . - . >> ID=Obrac10g00240.1:exon:3;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,209 357,319 . - . >> ID=Obrac10g00240.1:exon:2;Parent=Obrac10g00240.1 >> ObracChr10 maker exon 357,756 358,075 . - . >> ID=Obrac10g00240.1:exon:1;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,756 358,075 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 357,209 357,319 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 356,965 357,081 . - 2 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> ObracChr10 maker CDS 355,056 356,874 . - 0 >> ID=Obrac10g00240.1:cds;Parent=Obrac10g00240.1 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ObracChr10 repeatrunner match_part 357,755 358,084 566 - >> . >> ID=ObracChr10:hsp:75:1.3.0.3;Parent=ObracChr10:hit:75:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner protein_match 357,755 358,084 566 - >> . >> ID=ObracChr10:hit:75:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 117 226 +320 >> ObracChr10 repeatrunner match_part 357,202 357,294 142 - >> . >> ID=ObracChr10:hsp:74:1.3.0.3;Parent=ObracChr10:hit:74:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner protein_match 357,202 357,294 142 - >> . >> ID=ObracChr10:hit:74:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 264 294 +86 >> ObracChr10 repeatrunner match_part 355,059 357,092 3367 - >> . >> ID=ObracChr10:hsp:73:1.3.0.3;Parent=ObracChr10:hit:73:1.3.0.3;Target=DTM_g >> i_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> ObracChr10 repeatrunner protein_match 355,059 357,092 3367 - >> . >> ID=ObracChr10:hit:73:1.3.0.3;Name=DTM_gi_125573769_gb_EAZ15053.1hypothetic >> al;Target=DTM_gi_125573769_gb_EAZ15053.1hypothetical 289 937 +1816 >> >> >> This result is valid both for output lines from repeatmasker or >> repeatrunner, and the gene models come from either FGENESH or SNAP >> predictions. >> How can I explain this problem? >> Thanks, >> >> Dario >> >> >> >> >> >> -- >> Dario Copetti, PhD >> Research Associate >> Arizona Genomics Institute >> University of Arizona - BIO5 >> >> 1657 E. Helen St. >> Tucson, AZ 85721 >> www.genome.arizona.edu >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dario Copetti, PhD Research Associate Arizona Genomics Institute University of Arizona - BIO5 1657 E. Helen St. Tucson, AZ 85721 www.genome.arizona.edu From jmdoyle at purdue.edu Tue May 7 22:09:48 2013 From: jmdoyle at purdue.edu (Jacqueline R M Doyle) Date: Wed, 8 May 2013 00:09:48 -0400 (EDT) Subject: [maker-devel] MAKER installation debugging In-Reply-To: Message-ID: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> I downloaded MAKER 2.27 and it installed perfectly! I worked through the tutorial without any problems. Thanks for your help with this! Best wishes, Jackie From carsonhh at gmail.com Tue May 7 22:10:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 08 May 2013 00:10:33 -0400 Subject: [maker-devel] MAKER installation debugging In-Reply-To: <1621518279.221945.1367986188482.JavaMail.root@mailhub042.itcs.purdue.edu> Message-ID: I'm glad it worked. --Carson On 13-05-08 12:09 AM, "Jacqueline R M Doyle" wrote: >I downloaded MAKER 2.27 and it installed perfectly! I worked through the >tutorial without any problems. Thanks for your help with this! > >Best wishes, Jackie > From Carson.Holt at oicr.on.ca Wed May 8 13:25:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 8 May 2013 19:25:52 +0000 Subject: [maker-devel] Non-standard genetic code In-Reply-To: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Message-ID: It's not possible yet. It is one of the things we have on a list to do. It's not a small task either to make the necessary changes to the code, as the codon usage affects blastx alignments, exonerate protein2genome alignments, ab initio gene prediction, gene extension/boundary polishing, and UTR addition. So the changes have to go into many many locations. Thanks, Carson On 13-05-08 11:44 AM, "Daniil Alexeyevsky" wrote: >Hi, > >I want to use MAKER to annotate organism with non-standard genetic >code. (It only has UGA stop-codon, UAA and UAG code glutamine). > >Is it possible to use MAKER in this case? If I am bound to editing some >source codes, could you please point me where to look? > >With best regards, >-- Daniil > From mnuhn at ebi.ac.uk Fri May 10 06:10:35 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 13:10:35 +0100 Subject: [maker-devel] Duplicated exons Message-ID: <518CE3BB.3060003@ebi.ac.uk> Hello Carson! I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 and then about four hundred lines later there are these: LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 which are identical except for the order number after "exon:". This seems to have happened to a lot of features in that file. How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? Cheers, Michael. From daa at fbb.msu.ru Wed May 8 09:44:50 2013 From: daa at fbb.msu.ru (Daniil Alexeyevsky) Date: Wed, 08 May 2013 19:44:50 +0400 Subject: [maker-devel] Non-standard genetic code Message-ID: <97533c275fa3e6b05709c92455c9e6b8@fbb.msu.ru> Hi, I want to use MAKER to annotate organism with non-standard genetic code. (It only has UGA stop-codon, UAA and UAG code glutamine). Is it possible to use MAKER in this case? If I am bound to editing some source codes, could you please point me where to look? With best regards, -- Daniil From diana_leduc at eva.mpg.de Fri May 10 08:44:50 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 16:44:50 +0200 (CEST) Subject: [maker-devel] Maker consensus Message-ID: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:13:33 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:13:33 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: I'm sorry I don?t' understand question 1. You are you missing resulting fasta files, correct? Did your resulting GFF3 file have any features of type "gene"? Did you run fasta_merge after running gff3_merge? Could you give me more details on what you are trying to do, so I can take a stab at question 2 as well. Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 10:44 AM To: Cc: Gabriel Renaud , Janet Kelso , Torsten Schoeneberg Subject: [maker-devel] Maker consensus Dear maker developers, I am a phD student working on de novo assembly and annotation of a bird genome. I used Maker as annotation pipeline, which ran very well, and I obtained different annotations with evidence from Augustus gene predictor, small EST dataset from my organism and protein sequences from chicken, turkey and zebrafinch. I could combine the different gff files from different scaffolds into one gff file with annotations for the entire genome. I now have two questions: 1. What could be the reason that I haven't gotten the protein.fasta and trancript.fasta files 2. How can I obtain a consensus gene list of different evidences from maker? What I would actually need is the scaffold, coordinates and annotation (gene name) according to the 3 other bird species. Thank you in advance. Best regards, Diana Le Duc -- Max Planck Institute for Evolutionary Anthropology Department of Evolutionary Genetics Deutscher Platz 6 D-04103 Leipzig Phone +49 (0)341-3550-554 www.eva.mpg.de _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 10:25:17 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 12:25:17 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518CE3BB.3060003@ebi.ac.uk> Message-ID: Very odd. Which version of MAEKR are you using. Are you using GFF3 passthrough in the run that generates the duplication? Thanks, Carson On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >Hello Carson! > >I have been trying to get to the bottom of an error message when >(re)training snap. Snap, or more precisely fathom, was giving me unclear >error messages about misordered and overlapping exons. > >I have looked into the gff files from which these exons originate and >noticed that a lot of exons in that file were duplicated. For example I >have found these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >and then about four hundred lines later there are these: > >LSalAtl2s75 maker exon 186317 186936 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 >LSalAtl2s75 maker exon 187007 191531 . + . >ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >-snap-gene-2.15-mRNA-1 > >which are identical except for the order number after "exon:". > >This seems to have happened to a lot of features in that file. > >How can I avoid this? Or if this is just a rare problem, can I have >maker recompute the gff file without redoing all the computations again? > >Cheers, >Michael. > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From j.zohren at qmul.ac.uk Fri May 10 11:07:30 2013 From: j.zohren at qmul.ac.uk (Jasmin Zohren) Date: Fri, 10 May 2013 18:07:30 +0100 Subject: [maker-devel] annotation of birch genome Message-ID: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using "mpiexec -n 20 maker") I've also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It's been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4526 bytes Desc: not available URL: From mnuhn at ebi.ac.uk Fri May 10 11:35:37 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Fri, 10 May 2013 18:35:37 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D2FE9.6080900@ebi.ac.uk> On 05/10/2013 05:25 PM, Carson Holt wrote: > Very odd. Which version of MAEKR are you using. Are you using GFF3 > passthrough in the run that generates the duplication? I am using version 2.27 of maker. I am not using the passthrough option. Cheers, Michael. > Thanks, > Carson > > > On 13-05-10 8:10 AM, "Michael Nuhn" wrote: > >> Hello Carson! >> >> I have been trying to get to the bottom of an error message when >> (re)training snap. Snap, or more precisely fathom, was giving me unclear >> error messages about misordered and overlapping exons. >> >> I have looked into the gff files from which these exons originate and >> noticed that a lot of exons in that file were duplicated. For example I >> have found these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> and then about four hundred lines later there are these: >> >> LSalAtl2s75 maker exon 186317 186936 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> LSalAtl2s75 maker exon 187007 191531 . + . >> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75 >> -snap-gene-2.15-mRNA-1 >> >> which are identical except for the order number after "exon:". >> >> This seems to have happened to a lot of features in that file. >> >> How can I avoid this? Or if this is just a rare problem, can I have >> maker recompute the gff file without redoing all the computations again? >> >> Cheers, >> Michael. >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > From carsonhh at gmail.com Fri May 10 11:25:15 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:25:15 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: <005e01ce4da0$db3cbce0$91b636a0$@qmul.ac.uk> Message-ID: Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:44:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:44:01 -0400 Subject: [maker-devel] annotation of birch genome In-Reply-To: Message-ID: Also, if you will be annotating more genomes, you should look into getting allocation on your university's cluster. Queen Mary University has a 2000 cpu cluster. Most cluster managers bend over backwards to help Biologists use their systems as it looks good on progress reports and funding requests as they can show they have a broader user base (i.e. departments other than physics :-) --Carson From: Carson Holt Date: Friday, 10 May, 2013 1:25 PM To: Jasmin Zohren , Subject: Re: [maker-devel] annotation of birch genome Really only 560 Mb (Pine is 20 Gb by comparison). The single longest step for MAKER, is alignment which is done via BLAST. So the evidence dataset tends to be what can be filtered to get to a reasonable size. Protein alignments take long as they must be aligned against the 3 translated reading frames of the genome (so minimum 3x longer than DNA2DNA alignment, but in practice much much more). Alt_EST is even worse, as it must translate all 3 reading frames of the genome and all 3 of the data to be aligned (TBLASTX type alignment). So minimum 3x longer than protein alignment or 9X times longer than DNA2DNA alignment (but in practice much more). So the single best thing to do to reduce run time is to use protein evidence where possible instead of alt_EST evidence, or to ESTs from the same species and limit the use of proteins (ESTs from the same species are aligned as DNA2DNA, so it is very fast). Set all the blast_depth parameters in the maker_bopts.ctl file to 20 or 30. This will help if you have a very deep evidence dataset, by trimming overly deep alignment regions (less exonerate polishing). Also you can try running MAKER on 40 cpus rather than 20 (basically doubling up even though you only have 20). This can work because, even though you gave MAKER 20 cpus to use, all 20 will rarely be using 100% of each CPU simultaneously. So launching 40 threads will give a slight boost in many instances by filling in the gaps when "wait" operations let cpus idle for a fraction of a second. One good thing though, is that you only pay the price for data generation once. If you ever rerun with slightly modified parameters, MAKER is smart enough to reuse old results, so BLAST won't have to rerun. Thanks, Carson From: Jasmin Zohren Date: Friday, 10 May, 2013 1:07 PM To: Subject: [maker-devel] annotation of birch genome Dear Maker developers, I am a PhD student at Queen Mary University in London working on tree genomics. I recently attended the GMOD conference in Cambridge and it was a pity that no one from the Maker side was there. But the two days were interesting anyway. My current project is about birch which has just been sequenced and I now want to annotate it. Here are the details: - Genome size: 560 Mb - Size of EST file (from a related species): 28 Mb - I am running it on a single node with 20 cores of 512 GB RAM (using ?mpiexec -n 20 maker?) I?ve also attached my maker_opts file with the parameters I am using. I assume the maker_bopts and maker_exe file are of minor importance for now. My problem is, that the analysis is taking very long. It?s been running for weeks already and has only processed about 65 % of the scaffolds/contigs. So I was wondering whether you have any suggestions how to speed things up. Especially as I intend to use Maker for other projects, too, and will also come back to the birch annotation once I have mRNA data for it. Many thanks in advance and kind regards, Jasmin ----------------------------- Jasmin Zohren PhD student in the INTERCROSSING ITN Queen Mary University of London intercrossing.wikispaces.com evolve.sbcs.qmul.ac.uk _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/m aker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 11:41:55 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 19:41:55 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <495984016.225142.1368197090441.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > >, Janet Kelso < kelso at eva.mpg.de > >, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de > > > Subject: [maker-devel] Maker consensus > > > Dear maker developers, > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > I now have two questions: > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > > Thank you in advance. > > Best regards, > > Diana Le Duc > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 11:51:48 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 13:51:48 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Ok. You just ran the evidence and didn't give a gene predictor. You need to provide an HMM file for SNAP a species for augustus, or for rough annotations you can set protein3genome=1 and est2genome=1. This will try and generate models direct from the alignments. If you provide a gene predictor, then MAKER can talk to it about the evidence alignments so it can make a best gene call for the region. Then there will be gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and transcripts.fasta. If you need to train a predictor, you can train SNAP using the maker2zff script and the SNAP documentation or maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an excellent explanation as well as tools in a previous list message. list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html Script is in this github repo - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 1:41 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, Thank you for the quick answer. I ran gff3_merge to merge all the gff files and this resulted in a gff file, which has these type of fields: scaffold32239 blastx protein_match 22905 34500 174 + . ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAM L1-2039; scaffold32239 blastx match_part 22905 23045 174 + . ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG000 00000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00 000000219|DSCAML1-2039 172 218;Gap=M47; In comparison to the dpp_contig test file, I am missing est2genome evidence, most probably because my est data set is pretty poor. I have blastx and protein2genome evidence though. My goal is to extract the genes that could be annotated on the scaffolds. In the gff files the hits overlap most of the times, I can visualize this properly in apollo: for example one scaffold hits DSCAML gene in both zebrafinch and chicken, but extracting the coordinates between which this scaffold fits this annotated gene is difficult from the gff. Manually curating the genes is also not an option, since I am trying to do this for a 1.7Gb genome. I hope this explains better what we are after. Thank you once again. Best regards, Diana On May 10, 2013 at 6:13 PM Carson Holt wrote: > > I'm sorry I don?t' understand question 1. You are you missing resulting > fasta files, correct? Did your resulting GFF3 file have any features of type > "gene"? Did you run fasta_merge after running gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take a > stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 10:44 AM > To: < maker-devel at yandell-lab.org> > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < > kelso at eva.mpg.de>, Torsten Schoeneberg < > torsten.schoeneberg at medizin.uni-leipzig.de> > Subject: [maker-devel] Maker consensus > > > > > > > > Dear maker developers, > > > I am a phD student working on de novo assembly and annotation of a bird > genome. I used Maker as annotation pipeline, which ran very well, and I > obtained different annotations with evidence from Augustus gene predictor, > small EST dataset from my organism and protein sequences from chicken, turkey > and zebrafinch. I could combine the different gff files from different > scaffolds into one gff file with annotations for the entire genome. > > > I now have two questions: > > > 1. What could be the reason that I haven't gotten the protein.fasta and > trancript.fasta files > > > 2. How can I obtain a consensus gene list of different evidences from maker? > What I would actually need is the scaffold, coordinates and annotation (gene > name) according to the 3 other bird species. > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > Max Planck Institute for Evolutionary Anthropology > Department of Evolutionary Genetics > Deutscher Platz 6 > D-04103 Leipzig > > Phone +49 (0)341-3550-554 > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 10 12:08:35 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:08:35 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D2FE9.6080900@ebi.ac.uk> Message-ID: 2.27 from the website download or the SVN devel version? Thanks, Carson On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >On 05/10/2013 05:25 PM, Carson Holt wrote: >> Very odd. Which version of MAEKR are you using. Are you using GFF3 >> passthrough in the run that generates the duplication? > >I am using version 2.27 of maker. I am not using the passthrough option. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >> >>> Hello Carson! >>> >>> I have been trying to get to the bottom of an error message when >>> (re)training snap. Snap, or more precisely fathom, was giving me >>>unclear >>> error messages about misordered and overlapping exons. >>> >>> I have looked into the gff files from which these exons originate and >>> noticed that a lot of exons in that file were duplicated. For example I >>> have found these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> and then about four hundred lines later there are these: >>> >>> LSalAtl2s75 maker exon 186317 186936 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> LSalAtl2s75 maker exon 187007 191531 . + . >>> >>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>75 >>> -snap-gene-2.15-mRNA-1 >>> >>> which are identical except for the order number after "exon:". >>> >>> This seems to have happened to a lot of features in that file. >>> >>> How can I avoid this? Or if this is just a rare problem, can I have >>> maker recompute the gff file without redoing all the computations >>>again? >>> >>> Cheers, >>> Michael. >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > From carsonhh at gmail.com Fri May 10 12:29:32 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 10 May 2013 14:29:32 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: You can use any species augustus already has. If it doesn't then you train it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH environmental variable, and is usually ?/augusts/config/species Thanks, Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Friday, 10 May, 2013 2:16 PM To: , Carson Holt Cc: Torsten Schoeneberg , Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > > > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2au > gustus_gbk.pl > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1 > -2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000 > 000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT000000 > 00219|DSCAML1-2039 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> I'm sorry I don?t' understand question 1. You are you missing resulting >> fasta files, correct? Did your resulting GFF3 file have any features of type >> "gene"? Did you run fasta_merge after running gff3_merge? >> >> >> >> Could you give me more details on what you are trying to do, so I can take a >> stab at question 2 as well. >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 10:44 AM >> To: < maker-devel at yandell-lab.org> >> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de>, Torsten Schoeneberg < >> torsten.schoeneberg at medizin.uni-leipzig.de> >> Subject: [maker-devel] Maker consensus >> >> >> >> >> >> >> >> Dear maker developers, >> >> >> I am a phD student working on de novo assembly and annotation of a bird >> genome. I used Maker as annotation pipeline, which ran very well, and I >> obtained different annotations with evidence from Augustus gene predictor, >> small EST dataset from my organism and protein sequences from chicken, turkey >> and zebrafinch. I could combine the different gff files from different >> scaffolds into one gff file with annotations for the entire genome. >> >> >> I now have two questions: >> >> >> 1. What could be the reason that I haven't gotten the protein.fasta and >> trancript.fasta files >> >> >> 2. How can I obtain a consensus gene list of different evidences from maker? >> What I would actually need is the scaffold, coordinates and annotation (gene >> name) according to the 3 other bird species. >> Thank you in advance. >> >> >> >> Best regards, >> >> >> >> Diana Le Duc >> >> >> >> -- >> >> Max Planck Institute for Evolutionary Anthropology >> Department of Evolutionary Genetics >> Deutscher Platz 6 >> D-04103 Leipzig >> >> Phone +49 (0)341-3550-554 >> www.eva.mpg.de >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnuhn at ebi.ac.uk Fri May 10 18:29:10 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Sat, 11 May 2013 01:29:10 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <518D90D6.4080603@ebi.ac.uk> On 05/10/2013 07:08 PM, Carson Holt wrote: > 2.27 from the website download or the SVN devel version? SVN. I checked it out on 19/03/2013. > Thanks, > Carson > > > On 13-05-10 1:35 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 05:25 PM, Carson Holt wrote: >>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>> passthrough in the run that generates the duplication? >> >> I am using version 2.27 of maker. I am not using the passthrough option. >> >> Cheers, >> Michael. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>> >>>> Hello Carson! >>>> >>>> I have been trying to get to the bottom of an error message when >>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>> unclear >>>> error messages about misordered and overlapping exons. >>>> >>>> I have looked into the gff files from which these exons originate and >>>> noticed that a lot of exons in that file were duplicated. For example I >>>> have found these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> and then about four hundred lines later there are these: >>>> >>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>> >>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s >>>> 75 >>>> -snap-gene-2.15-mRNA-1 >>>> >>>> which are identical except for the order number after "exon:". >>>> >>>> This seems to have happened to a lot of features in that file. >>>> >>>> How can I avoid this? Or if this is just a rare problem, can I have >>>> maker recompute the gff file without redoing all the computations >>>> again? >>>> >>>> Cheers, >>>> Michael. >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > > From dsth at ebi.ac.uk Fri May 10 18:20:42 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Sat, 11 May 2013 01:20:42 +0100 Subject: [maker-devel] Duplicated exons Message-ID: That is odd. i've run that version of maker 30-40x at ebi lately and never seen it. Is it just one scaffold? While i'd be surprised if it's the cause but have you been playing with the file locking options Carson mentioned a while back? I'd definitely be inclined to re-process it if it's just the one scaffold. Dan On May 10, 2013 12:45 PM, "Michael Nuhn" wrote: > > Hello Carson! > > I have been trying to get to the bottom of an error message when (re)training snap. Snap, or more precisely fathom, was giving me unclear error messages about misordered and overlapping exons. > > I have looked into the gff files from which these exons originate and noticed that a lot of exons in that file were duplicated. For example I have found these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > and then about four hundred lines later there are these: > > LSalAtl2s75 maker exon 186317 186936 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > LSalAtl2s75 maker exon 187007 191531 . + . ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1 > > which are identical except for the order number after "exon:". > > This seems to have happened to a lot of features in that file. > > How can I avoid this? Or if this is just a rare problem, can I have maker recompute the gff file without redoing all the computations again? > > Cheers, > Michael. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Fri May 10 12:16:34 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Fri, 10 May 2013 20:16:34 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1222330587.225314.1368207715429.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, In maker_exe.ctl I would have to provide the path to augustus. Augustus has a training set for chicken that I would use. Is it possible to specify the species i want to use, or the only way is training Augustus myself? Thank you! Best, Diana On May 10, 2013 at 7:51 PM Carson Holt wrote: > Ok. You just ran the evidence and didn't give a gene predictor. You need to > provide an HMM file for SNAP a species for augustus, or for rough annotations > you can set protein3genome=1 and est2genome=1. This will try and generate > models direct from the alignments. > > If you provide a gene predictor, then MAKER can talk to it about the evidence > alignments so it can make a best gene call for the region. Then there will be > gene/mRNA/exon model in the GFF3 file and entires in the proteins.fasta and > transcripts.fasta. If you need to train a predictor, you can train SNAP using > the maker2zff script and the SNAP documentation or maker GMOD tutorial. If > you want to train augustus Jason Stajich wrote an excellent explanation as > well as tools in a previous list message. > > list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > Script is in this github repo - > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > Thanks, > Carson > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 1:41 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > Thank you for the quick answer. > I ran gff3_merge to merge all the gff files and this resulted in a gff file, > which has these type of fields: > scaffold32239 blastx protein_match 22905 34500 174 + . > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > scaffold32239 blastx match_part 22905 23045 174 + . > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > 172 218;Gap=M47; > In comparison to the dpp_contig test file, I am missing est2genome evidence, > most probably because my est data set is pretty poor. I have blastx and > protein2genome evidence though. > > My goal is to extract the genes that could be annotated on the scaffolds. In > the gff files the hits overlap most of the times, I can visualize this > properly in apollo: for example one scaffold hits DSCAML gene in both > zebrafinch and chicken, but extracting the coordinates between which this > scaffold fits this annotated gene is difficult from the gff. Manually curating > the genes is also not an option, since I am trying to do this for a 1.7Gb > genome. > > I hope this explains better what we are after. > > Thank you once again. > > Best regards, > > Diana > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > I'm sorry I don?t' understand question 1. You are you missing > > > resulting fasta files, correct? Did your resulting GFF3 file have any > > > features of type "gene"? Did you run fasta_merge after running > > > gff3_merge? > > > > Could you give me more details on what you are trying to do, so I can take > > a stab at question 2 as well. > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 10:44 AM > > To: < maker-devel at yandell-lab.org > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > >, Janet Kelso < kelso at eva.mpg.de > > >, Torsten Schoeneberg < > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > Subject: [maker-devel] Maker consensus > > > > > > Dear maker developers, > > > > I am a phD student working on de novo assembly and annotation of a bird > > genome. I used Maker as annotation pipeline, which ran very well, and I > > obtained different annotations with evidence from Augustus gene predictor, > > small EST dataset from my organism and protein sequences from chicken, > > turkey and zebrafinch. I could combine the different gff files from > > different scaffolds into one gff file with annotations for the entire > > genome. > > > > I now have two questions: > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > trancript.fasta files > > > > 2. How can I obtain a consensus gene list of different evidences from > > maker? What I would actually need is the scaffold, coordinates and > > annotation (gene name) according to the 3 other bird species. > > > > Thank you in advance. > > > > Best regards, > > > > Diana Le Duc > > > > -- > > > > Max Planck Institute for Evolutionary Anthropology > > Department of Evolutionary Genetics > > Deutscher Platz 6 > > D-04103 Leipzig > > > > Phone +49 (0)341-3550-554 > > www.eva.mpg.de > > _______________________________________________ maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From linzkl007 at hotmail.com Sat May 11 11:28:47 2013 From: linzkl007 at hotmail.com (=?gb2312?B?7OTs5A==?=) Date: Sun, 12 May 2013 01:28:47 +0800 Subject: [maker-devel] about predictor training Message-ID: Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin -------------- next part -------------- An HTML attachment was scrubbed... URL: From kangyangjae at gmail.com Sun May 12 21:53:34 2013 From: kangyangjae at gmail.com (Kang, Yang Jae) Date: Mon, 13 May 2013 12:53:34 +0900 Subject: [maker-devel] exon numbering bug? Message-ID: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Sun May 12 22:01:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 00:01:41 -0400 Subject: [maker-devel] exon numbering bug? In-Reply-To: <070c01ce4f8d$73862fc0$5a928f40$@gmail.com> Message-ID: There has been some post processing of the GFF3. It is not an original MAKER result file. I can tell based on the ID's (maker doesn't assign numerical IDs). Most likely it was processed to make exons unique without having dual parentage. Normally if the same exon is found in two transcripts it will have two parents separated by a comma. I imaging that the post processing script duplicated the exon, creating independent IDs and split the parents, but left the Name= tag the same. Since the Name= tag was based off of the first transcript the exon belonged to, it stayed the same. --Carson From: "Kang, Yang Jae" Date: Sunday, 12 May, 2013 11:53 PM To: Subject: [maker-devel] exon numbering bug? Hello I want to check this is bug or my misunderstanding. The following is the gff3 result of maker pipeline. I think those red marks should be mRNA-2. This type of error was found only at exon scaffold_22 maker mRNA 604856 612126 . + . ID=211342;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2;Parent=211320 scaffold_22 maker exon 604856 605185 0.51 + . ID=211343;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2788;Parent =211342 scaffold_22 maker exon 608362 608456 0.51 + . ID=211344;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2791;Parent =211342 scaffold_22 maker exon 610193 610286 0.51 + . ID=211345;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2792;Parent =211342 scaffold_22 maker exon 610583 610714 0.51 + . ID=211346;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2793;Parent =211342 scaffold_22 maker exon 610838 610942 0.51 + . ID=211347;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2794;Parent =211342 scaffold_22 maker exon 611458 612126 0.51 + . ID=211348;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-1:exon:2795;Parent =211342 scaffold_22 maker five_prime_UTR 604856 604972 . + . ID=211349;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR1;Parent=2113 42 scaffold_22 maker CDS 604973 605185 . + 0 ID=211350;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2905;Parent= 211342 scaffold_22 maker CDS 608362 608456 . + 0 ID=211351;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2906;Parent= 211342 scaffold_22 maker CDS 610193 610286 . + 1 ID=211352;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2907;Parent= 211342 scaffold_22 maker CDS 610583 610714 . + 0 ID=211353;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2908;Parent= 211342 scaffold_22 maker CDS 610838 610942 . + 0 ID=211354;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2909;Parent= 211342 scaffold_22 maker CDS 611458 611661 . + 0 ID=211355;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:cds:2910;Parent= 211342 scaffold_22 maker three_prime_UTR 611662 612126 . + . ID=211356;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:UTR2;Parent=2113 42 scaffold_22 maker start_codon 604973 604975 . + . ID=211357;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:start1;Parent=21 1342 scaffold_22 maker stop_codon 611659 611661 . + . ID=211358;Name=maker-scaffold_22-augustus-gene-0.532-mRNA-2:stop2;Parent=211 342 Thank you! Kang, Yang Jae Ph.D. Cropgenomics Lab. College of Agriculture and Life Science Seoul National University Korea _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:00:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:00:01 -0400 Subject: [maker-devel] about predictor training In-Reply-To: Message-ID: You need to convert the GTF files to GFF3. There is a tophat2gff and cufflinks2gff script that come with MAKER. I recommend only using cufflinks results and ignoring tophat results though as they tend to be a lot more spurious. Jason Stajich wrote an excellent explanation on training Augustus on the list previously - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html He also included scripts to assist with the training - https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2 augustus_gbk.pl Overall the strategy is similar to the one used to train SNAP. Thanks, Carson From: ?? Date: Saturday, 11 May, 2013 1:28 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] about predictor training Hi, I'm trying to use MAKER to annotate the new genome sequence which I assembled by myseft. I used TopHat and Cufflinks to align the sequence based on the RNA-seq we have. Based on the tutorial of MAKER, I may need three fasta format file including assembly data, ESTs and protein database to train the SNAP. I may use SwissProt as the protein database. Can I use the gtf result from Cufflinks directly as an ESTs during the training? Another is, if I want to use Augustus to do the ab initio gene prediction, do I need to do the same way as SNAP? Cause I saw some posts that the result from ab initio would be used as the evidence to train the predictor. Can I ask is there has some order doing the prediction in different predictor? Thank you so much for you help. Lin _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 13 08:01:58 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 10:01:58 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <518D90D6.4080603@ebi.ac.uk> Message-ID: Could you send me your maker opts files, the contig that fails, and the evidence files you use for that contig. Thanks, Carson On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >On 05/10/2013 07:08 PM, Carson Holt wrote: >> 2.27 from the website download or the SVN devel version? > >SVN. I checked it out on 19/03/2013. > >> Thanks, >> Carson >> >> >> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>> passthrough in the run that generates the duplication? >>> >>> I am using version 2.27 of maker. I am not using the passthrough >>>option. >>> >>> Cheers, >>> Michael. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>> >>>>> Hello Carson! >>>>> >>>>> I have been trying to get to the bottom of an error message when >>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>> unclear >>>>> error messages about misordered and overlapping exons. >>>>> >>>>> I have looked into the gff files from which these exons originate and >>>>> noticed that a lot of exons in that file were duplicated. For >>>>>example I >>>>> have found these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> and then about four hundred lines later there are these: >>>>> >>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>> >>>>> >>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>2s >>>>> 75 >>>>> -snap-gene-2.15-mRNA-1 >>>>> >>>>> which are identical except for the order number after "exon:". >>>>> >>>>> This seems to have happened to a lot of features in that file. >>>>> >>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>> maker recompute the gff file without redoing all the computations >>>>> again? >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>> >> >> > From mnuhn at ebi.ac.uk Mon May 13 10:30:36 2013 From: mnuhn at ebi.ac.uk (Michael Nuhn) Date: Mon, 13 May 2013 17:30:36 +0100 Subject: [maker-devel] Duplicated exons In-Reply-To: References: Message-ID: <5191152C.5030103@ebi.ac.uk> Hello Carson! On 05/13/2013 03:01 PM, Carson Holt wrote: > Could you send me your maker opts files, the contig that fails, and the > evidence files you use for that contig. Thanks for offering your help. I worked around the problem this morning by removing all exons from the training set for which I was getting the error. Now I'm rerunning maker and I can't find any gff files at the moment with this problem. If the problem reappears, I'll send you the files. Cheers, Michael. > Thanks, > Carson > > > > On 13-05-10 8:29 PM, "Michael Nuhn" wrote: > >> On 05/10/2013 07:08 PM, Carson Holt wrote: >>> 2.27 from the website download or the SVN devel version? >> >> SVN. I checked it out on 19/03/2013. >> >>> Thanks, >>> Carson >>> >>> >>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>> >>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>> passthrough in the run that generates the duplication? >>>> >>>> I am using version 2.27 of maker. I am not using the passthrough >>>> option. >>>> >>>> Cheers, >>>> Michael. >>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>> >>>>>> Hello Carson! >>>>>> >>>>>> I have been trying to get to the bottom of an error message when >>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>> unclear >>>>>> error messages about misordered and overlapping exons. >>>>>> >>>>>> I have looked into the gff files from which these exons originate and >>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>> example I >>>>>> have found these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> and then about four hundred lines later there are these: >>>>>> >>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>> >>>>>> >>>>>> ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalAtl >>>>>> 2s >>>>>> 75 >>>>>> -snap-gene-2.15-mRNA-1 >>>>>> >>>>>> which are identical except for the order number after "exon:". >>>>>> >>>>>> This seems to have happened to a lot of features in that file. >>>>>> >>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>> maker recompute the gff file without redoing all the computations >>>>>> again? >>>>>> >>>>>> Cheers, >>>>>> Michael. >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>> g >>>>> >>>>> >>>> >>> >>> >> > > From carsonhh at gmail.com Mon May 13 10:07:13 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 13 May 2013 12:07:13 -0400 Subject: [maker-devel] Duplicated exons In-Reply-To: <5191152C.5030103@ebi.ac.uk> Message-ID: Ok. Thanks, Carson On 13-05-13 12:30 PM, "Michael Nuhn" wrote: >Hello Carson! > >On 05/13/2013 03:01 PM, Carson Holt wrote: >> Could you send me your maker opts files, the contig that fails, and the >> evidence files you use for that contig. > >Thanks for offering your help. > >I worked around the problem this morning by removing all exons from the >training set for which I was getting the error. Now I'm rerunning maker >and I can't find any gff files at the moment with this problem. > >If the problem reappears, I'll send you the files. > >Cheers, >Michael. > >> Thanks, >> Carson >> >> >> >> On 13-05-10 8:29 PM, "Michael Nuhn" wrote: >> >>> On 05/10/2013 07:08 PM, Carson Holt wrote: >>>> 2.27 from the website download or the SVN devel version? >>> >>> SVN. I checked it out on 19/03/2013. >>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> On 13-05-10 1:35 PM, "Michael Nuhn" wrote: >>>> >>>>> On 05/10/2013 05:25 PM, Carson Holt wrote: >>>>>> Very odd. Which version of MAEKR are you using. Are you using GFF3 >>>>>> passthrough in the run that generates the duplication? >>>>> >>>>> I am using version 2.27 of maker. I am not using the passthrough >>>>> option. >>>>> >>>>> Cheers, >>>>> Michael. >>>>> >>>>>> Thanks, >>>>>> Carson >>>>>> >>>>>> >>>>>> On 13-05-10 8:10 AM, "Michael Nuhn" wrote: >>>>>> >>>>>>> Hello Carson! >>>>>>> >>>>>>> I have been trying to get to the bottom of an error message when >>>>>>> (re)training snap. Snap, or more precisely fathom, was giving me >>>>>>> unclear >>>>>>> error messages about misordered and overlapping exons. >>>>>>> >>>>>>> I have looked into the gff files from which these exons originate >>>>>>>and >>>>>>> noticed that a lot of exons in that file were duplicated. For >>>>>>> example I >>>>>>> have found these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:3;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:4;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> and then about four hundred lines later there are these: >>>>>>> >>>>>>> LSalAtl2s75 maker exon 186317 186936 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:1;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> LSalAtl2s75 maker exon 187007 191531 . + . >>>>>>> >>>>>>> >>>>>>> >>>>>>>ID=maker-LSalAtl2s75-snap-gene-2.15-mRNA-1:exon:2;Parent=maker-LSalA >>>>>>>tl >>>>>>> 2s >>>>>>> 75 >>>>>>> -snap-gene-2.15-mRNA-1 >>>>>>> >>>>>>> which are identical except for the order number after "exon:". >>>>>>> >>>>>>> This seems to have happened to a lot of features in that file. >>>>>>> >>>>>>> How can I avoid this? Or if this is just a rare problem, can I have >>>>>>> maker recompute the gff file without redoing all the computations >>>>>>> again? >>>>>>> >>>>>>> Cheers, >>>>>>> Michael. >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> >>>>>>> >>>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab. >>>>>>>or >>>>>>> g >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From rob.syme at gmail.com Tue May 14 00:54:18 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 14 May 2013 14:54:18 +0800 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Message-ID: Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz, making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 14 05:20:00 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 14 May 2013 07:20:00 -0400 Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy In-Reply-To: Message-ID: You have to use MPICH2, the new MPICH3 is not compatible. MPI version 3 is a completely new protocol implemented in MPICH3, and it breaks MAKER. You can also use OpenMPI with the MAKER version 2.27. Thanks, Carson From: Rob Syme Date: Tuesday, 14 May, 2013 2:54 AM To: Subject: [maker-devel] symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Hi all I'm trying to get mpi_maker up and running. I've installed the latest version of MPICH from mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz , making sure to "./configure --enable-shared" Everything seems to install without trouble, but running mpiexec -n 1 mpi_maker gives: /usr/bin/perl: symbol lookup error: /usr/local/lib/libmpich.so.10: undefined symbol: MPIU_Strncpy Does anybody here know how to fix this? Do I need to downgrade to an older version of MPICH? Thanks! Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Tue May 14 14:42:33 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 14 May 2013 20:42:33 +0000 Subject: [maker-devel] MPI MAKER hanging NFS Message-ID: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1 with RHEL 6.3. Questions: (1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues. (2) Has anyone else seen this? (3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected? I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks. Thanks, Todd Heywood CSHL From Carson.Holt at oicr.on.ca Tue May 14 19:01:00 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 01:01:00 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > From eernst at cshl.edu Wed May 15 11:08:08 2013 From: eernst at cshl.edu (Evan Ernst) Date: Wed, 15 May 2013 13:08:08 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: <0ED760096959DE4291A3550A46EC46857189F3A6@EX-HS-MBX05.cshl.edu> Message-ID: Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt wrote: > No it does not use ROMIO. > > The locking may be do to how your NFS is implemented. MAKER does a lot of > small writes. Some NFS implementations do not handle that well and only > like large infrequent writes and frequent reads? > MAKER also uses a variant of the File:::NFSLock module which uses > hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > enabled (described here > http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > I know that the FhGFS implementation of NFS has broken hard link > functionality. > > > Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > mounted location. It must be local (/tmp for example). This is because > certain types of operations are not always NFS safe and need a local > location to work with (anything involving berkley DB or SQLite for > example). Make sure you are not setting that to an NFS mounted scratch > location. The mpi**** files, are examples of some short lived files that > should not be in NFS. They hold chunks of data from threads that are > processing the genome and are very rapidly created and deleted. They will > be cleaned up automatically when maker finished or killed by standard > signals such as when you hit ^C or use kill 15. > > > Thanks, > Carson > > > > > On 13-05-14 4:42 PM, "Heywood, Todd" wrote: > > >We have been getting hung NFS mounts on some nodes when running MPI MAKER > >(version 2.27). Processes go into a "D" state and cannot be killed. We > >end up having to reboot nodes to recover them. We are running MPICH2 > >version 1.4.1p1 > >with RHEL 6.3. Questions: > > > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >on a sync_page system call under NFS. That *might* imply some locking > >issues. > > > >(2) Has anyone else seen this? > > > >(3) The root directory (parent of genome.maker.output directory) has lots > >of mpi***** files, all of which have the first line > >"pst0Process::MpiChunk". Is this expected? > > > >I'm able to reproducibly hang NFS on some nodes when using at least 4 > >32-core nodes and 128 running MPI tasks. > > > >Thanks, > > > >Todd Heywood > >CSHL > > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 15 11:15:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 15 May 2013 17:15:52 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Thu May 16 10:08:43 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Thu, 16 May 2013 17:08:43 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195048B.9080707@ebi.ac.uk> Hi Carson, When I was trying to load the Maker-2.27 results into ensembl, I found that few hundreds of genes with 'duplicate exons' . When I looked in the gff file, I found cases like this, where the exons are not actually duplicated but have two Parents with same mRNA ID. This can be a potential alternate transcript, attached to the same transcript by mistake? Many thanks Uma 3 maker gene 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179 3 maker mRNA 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_AED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 3 maker exon 524271 524480 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524538 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker exon 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 524903 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524538 525182 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker CDS 524271 524480 . - 0 ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524271 525467 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 3 maker five_prime_UTR 524904 525182 . - . ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1 From carsonhh at gmail.com Thu May 16 10:13:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:13:05 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I've had one other report of this on the devel list, but haven't gotten data to test with. Do you have the run files that produced the duplicate exon? If so, cCould you send me theVoid directory for the contig that shows the dulicate, and the maker_opts.ctl file? Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu May 16 10:25:36 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:25:36 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195048B.9080707@ebi.ac.uk> Message-ID: I think this also may be a result of using GFF3 pass-through. So if that is the case, could you send me any GFF3 files you gave maker in addition to the other files I asked for. Thanks, Carson On 13-05-16 12:08 PM, "Uma Maheswari" wrote: >Hi Carson, > >When I was trying to load the Maker-2.27 results into ensembl, I found >that few hundreds of genes with 'duplicate exons' . When I looked in the >gff file, I found cases like this, where the exons are not actually >duplicated but have two Parents with same mRNA ID. This can be a >potential alternate transcript, attached to the same transcript by >mistake? > >Many thanks >Uma > > > > > >3 maker gene 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >-gene-6.179 >3 maker mRNA 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >3 maker exon 524271 524480 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker exon 524538 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >9-mRNA-1 >3 maker exon 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >masked-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 524903 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524538 525182 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker CDS 524271 524480 . - 0 >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >d-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524271 525467 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 >3 maker five_prime_UTR 524904 525182 . - . >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >ustus_masked-3-processed-gene-6.179-mRNA-1 > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dsth at ebi.ac.uk Thu May 16 10:38:35 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 16 May 2013 17:38:35 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: <5195048B.9080707@ebi.ac.uk> Message-ID: hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 16 10:50:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 16 May 2013 12:50:50 -0400 Subject: [maker-devel] duplicate exons? In-Reply-To: Message-ID: Yes. Perhaps this is the same issue Michael saw, although the one difference I see from his post is the Parent= attribute. --> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-proce ssed-gene-6.179-mRNA-1 I have seen duplicate exons from GFF3 pass-through in the past, but if that's not being used I'd be very appreciative of any test dataset you could give me. Thanks, Carson From: Daniel Hughes Date: Thursday, 16 May, 2013 12:38 PM To: Carson Holt Cc: Uma Maheswari , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] duplicate exons? hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ---------------------------------------------------------------------------- --------- dsth at cantab.net dsth at cpan.org 2013/5/16 Carson Holt > I think this also may be a result of using GFF3 pass-through. So if that > is the case, could you send me any GFF3 files you gave maker in addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" wrote: > >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I found >> >that few hundreds of genes with 'duplicate exons' . When I looked in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Fri May 17 02:41:56 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Fri, 17 May 2013 09:41:56 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: References: Message-ID: <5195ED54.4090501@ebi.ac.uk> Hi Carson, I checked with Michael, this is different from what he saw, he had entire segements of gff files duplicated, In this case, just Parent id is. I am preparing the files you asked for, will send them soon thanks Uma On 16/05/13 17:50, Carson Holt wrote: > Yes. Perhaps this is the same issue Michael saw, although the one > difference I see from his post is the Parent= attribute. > > --> > Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 > > I have seen duplicate exons from GFF3 pass-through in the past, but if > that's not being used I'd be very appreciative of any test dataset you > could give me. > > Thanks, > Carson > > > > > From: Daniel Hughes > > Date: Thursday, 16 May, 2013 12:38 PM > To: Carson Holt > > Cc: Uma Maheswari >, > "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] duplicate exons? > > hiya, are you using the same instance as michael at ebi as this sounds > like the same problem he had last week and he wasn't running pass > through. i've run 2.27 here 30+ times here and not seen this? is > something very strange corrupted? > > dan. > > Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) > ------------------------------------------------------------------------------------- > dsth at cantab.net > dsth at cpan.org > > > 2013/5/16 Carson Holt > > > I think this also may be a result of using GFF3 pass-through. So > if that > is the case, could you send me any GFF3 files you gave maker in > addition > to the other files I asked for. > > Thanks, > Carson > > > > On 13-05-16 12:08 PM, "Uma Maheswari" > wrote: > > >Hi Carson, > > > >When I was trying to load the Maker-2.27 results into ensembl, I found > >that few hundreds of genes with 'duplicate exons' . When I looked > in the > >gff file, I found cases like this, where the exons are not actually > >duplicated but have two Parents with same mRNA ID. This can be a > >potential alternate transcript, attached to the same transcript by > >mistake? > > > >Many thanks > >Uma > > > > > > > > > > > >3 maker gene 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed > >-gene-6.179 > >3 maker mRNA 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- > >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A > >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 > >3 maker exon 524271 524480 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker exon 524538 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 > >9-mRNA-1 > >3 maker exon 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ > >masked-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 524903 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524538 525182 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker CDS 524271 524480 . - 0 > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske > >d-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524271 525467 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > >3 maker five_prime_UTR 524904 525182 . - . > >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug > >ustus_masked-3-processed-gene-6.179-mRNA-1 > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano.abriata at epfl.ch Fri May 17 03:45:41 2013 From: luciano.abriata at epfl.ch (Luciano Abriata) Date: Fri, 17 May 2013 09:45:41 +0000 Subject: [maker-devel] getting protein sequences from genomes Message-ID: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: 1) How can I match proteins predicted for the same gene in two genomes? 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta Which of these files contains the definite set of predicted protein sequences? Thanks in advance! Luciano -------------- next part -------------- An HTML attachment was scrubbed... URL: From heywood at cshl.edu Fri May 17 07:25:16 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Fri, 17 May 2013 13:25:16 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> It appears that a kernel bug caused the NFS hang, at least for limlted scale testing (6 nodes, 192 tasks). I upgraded the kernel from 2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and cannot reproduce the hangs. As far a TMPDIR, I'm not really sure I understand. We use SGE, and the TMPDIR we are referring to is set by SGE within a job to be /tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? Todd From: Carson Holt > Date: Wednesday, May 15, 2013 1:15 PM To: "Ernst, Evan" > Cc: Todd Heywood >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes. Thanks, Carson From: Evan Ernst > Date: Wednesday, 15 May, 2013 1:08 PM To: Carson Holt > Cc: "Heywood, Todd" >, "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected. In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. Thanks, Evan On Tue, May 14, 2013 at 9:01 PM, Carson Holt > wrote: No it does not use ROMIO. The locking may be do to how your NFS is implemented. MAKER does a lot of small writes. Some NFS implementations do not handle that well and only like large infrequent writes and frequent reads? MAKER also uses a variant of the File:::NFSLock module which uses hardlinks to force a flush of the NFS IO cache when asyncrynous IO is enabled (described here http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). I know that the FhGFS implementation of NFS has broken hard link functionality. Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS mounted location. It must be local (/tmp for example). This is because certain types of operations are not always NFS safe and need a local location to work with (anything involving berkley DB or SQLite for example). Make sure you are not setting that to an NFS mounted scratch location. The mpi**** files, are examples of some short lived files that should not be in NFS. They hold chunks of data from threads that are processing the genome and are very rapidly created and deleted. They will be cleaned up automatically when maker finished or killed by standard signals such as when you hit ^C or use kill 15. Thanks, Carson On 13-05-14 4:42 PM, "Heywood, Todd" > wrote: >We have been getting hung NFS mounts on some nodes when running MPI MAKER >(version 2.27). Processes go into a "D" state and cannot be killed. We >end up having to reboot nodes to recover them. We are running MPICH2 >version 1.4.1p1 >with RHEL 6.3. Questions: > >(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >on a sync_page system call under NFS. That *might* imply some locking >issues. > >(2) Has anyone else seen this? > >(3) The root directory (parent of genome.maker.output directory) has lots >of mpi***** files, all of which have the first line >"pst0Process::MpiChunk". Is this expected? > >I'm able to reproducibly hang NFS on some nodes when using at least 4 >32-core nodes and 128 running MPI tasks. > >Thanks, > >Todd Heywood >CSHL > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Fri May 17 07:40:50 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Fri, 17 May 2013 13:40:50 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> Message-ID: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt > >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" > >Cc: Todd Heywood >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst > >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt > >Cc: "Heywood, Todd" >, >"maker-devel at yandell-lab.org" >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From barry.moore at genetics.utah.edu Fri May 17 13:02:31 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Fri, 17 May 2013 13:02:31 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> Message-ID: On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? blastp tweaked with parameters to optimize near perfect match > > 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. QI values are as follows: 5' UTR Length Fraction of splice sites confirmed by EST alignment. Fraction of exons that overlap and EST alignment. Fraction of exons that overlap EST or protein alignment. Fraction of splice sites confirmed by an ab initio prediction. Fraction of exons that overlap an ab intitio prediction. Number of exons in the transcript. 3' UTR length. Length of encoded protein. > 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta > > Which of these files contains the definite set of predicted protein sequences? The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Sun May 19 22:16:10 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Mon, 20 May 2013 12:16:10 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Message-ID: Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 14:36:38 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 16:36:38 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> References: <0ED760096959DE4291A3550A46EC4685718A4299@EX-HS-MBX05.cshl.edu> <561e317e5e8246978eccdf53ed96067b@EX-HS-HT02.cshl.edu> Message-ID: Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt > > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" > > >Cc: Todd Heywood >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst > > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt > > >Cc: "Heywood, Todd" >, > >"maker-devel at yandell-lab.org" > >> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > >> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2013-05-20 at 4.14.09 PM.png Type: image/png Size: 22634 bytes Desc: not available URL: From carsonhh at gmail.com Mon May 20 17:50:28 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 19:50:28 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" , "Heywood, Todd" Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" wrote: > >> >It appears that a kernel bug caused the NFS hang, at least for limlted >> >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >> >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >> >cannot reproduce the hangs. >> > >> >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >> >TMPDIR we are referring to is set by SGE within a job to be >> >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> > >> >Todd >> > >> > >> > >> > >> >From: Carson Holt > >> >Date: Wednesday, May 15, 2013 1:15 PM >> >To: "Ernst, Evan" > >> >Cc: Todd Heywood >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >The mpi**** files should be generated in the $TMPDIR or TMP= location. >> >If they are happening in the working directory, then there is a problem. >> >If you are not setting TMP=, perhaps TMPDIR is not being exported when >> >'mpiexec' is launched. You may have to manually specify that it needs to >> >be exported to the other nodes using the mpiexec command line flags. >> >OpenMPI for example does not export all environmental variables by >> >default to the other nodes. >> > >> >Thanks, >> >Carson >> > >> > >> > >> >From: Evan Ernst > >> >Date: Wednesday, 15 May, 2013 1:08 PM >> >To: Carson Holt > >> >Cc: "Heywood, Todd" >, >> >"maker-devel at yandell-lab.org" >> >> >> >Subject: Re: [maker-devel] MPI MAKER hanging NFS >> > >> >Hi Carson, >> > >> >For these runs, -TMP is set to the $TMPDIR environment variable via maker >> >command line argument in the cluster job script to use the local disk on >> >each node. We can see files being generated in those locations on each >> >node, so it seems this is working as expected. >> > >> >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >> >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >> >onto a different, faster nfs mount than the working dir where the mpi**** >> >files are being written. >> > >> >Thanks, >> >Evan >> > >> > >> > >> >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >> >> wrote: >> >No it does not use ROMIO. >> > >> >The locking may be do to how your NFS is implemented. MAKER does a lot of >> >small writes. Some NFS implementations do not handle that well and only >> >like large infrequent writes and frequent reads? >> >MAKER also uses a variant of the File:::NFSLock module which uses >> >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >> >enabled (described here >> >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >> >I know that the FhGFS implementation of NFS has broken hard link >> >functionality. >> > >> > >> >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >> >mounted location. It must be local (/tmp for example). This is because >> >certain types of operations are not always NFS safe and need a local >> >location to work with (anything involving berkley DB or SQLite for >> >example). Make sure you are not setting that to an NFS mounted scratch >> >location. The mpi**** files, are examples of some short lived files that >> >should not be in NFS. They hold chunks of data from threads that are >> >processing the genome and are very rapidly created and deleted. They will >> >be cleaned up automatically when maker finished or killed by standard >> >signals such as when you hit ^C or use kill 15. >> > >> > >> >Thanks, >> >Carson >> > >> > >> > >> > >> >On 13-05-14 4:42 PM, "Heywood, Todd" >> >> wrote: >> > >>> >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>> >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>> >>end up having to reboot nodes to recover them. We are running MPICH2 >>> >>version 1.4.1p1 >>> >>with RHEL 6.3. Questions: >>> >> >>> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>> >>on a sync_page system call under NFS. That *might* imply some locking >>> >>issues. >>> >> >>> >>(2) Has anyone else seen this? >>> >> >>> >>(3) The root directory (parent of genome.maker.output directory) has lots >>> >>of mpi***** files, all of which have the first line >>> >>"pst0Process::MpiChunk". Is this expected? >>> >> >>> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>> >>32-core nodes and 128 running MPI tasks. >>> >> >>> >>Thanks, >>> >> >>> >>Todd Heywood >>> >>CSHL >>> >> >>> >> >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eernst at cshl.edu Mon May 20 18:20:22 2013 From: eernst at cshl.edu (Evan Ernst) Date: Mon, 20 May 2013 20:20:22 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: References: Message-ID: /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt wrote: > Could you run the following command for me and share the ouptut with me? > > mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > > Thanks, > Carson > > > > From: Evan Ernst > > Date: Monday, 20 May, 2013 4:36 PM > To: Carson Holt > > Cc: "maker-devel at yandell-lab.org" < > maker-devel at yandell-lab.org>, > "Heywood, Todd" > > Subject: Re: [maker-devel] MPI MAKER hanging NFS > > Hi Carson, > > The SGE launch script looks like this (sans SGE args): > > mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl > maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > > Snooping on the running jobs (see attached image), it looks like $TMPDIR > is evaluated to a local directory by the shell of the MPI master node as > intended, so the evaluated path, not the env var reference, is being passed > to the MPI workers. > > Despite this, the mpi*** files are still being created in the working > directory. > > If I understand correctly, these mpi*** files are meant to be written to > the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), > which should be equivalent, but this doesn't seem to be the case. > > Thanks, > Evan > > > > > On Fri, May 17, 2013 at 9:40 AM, Carson Holt > wrote: > I'm glad your getting better results. > > With respect to environmental variables. One common error in MPI > execution is that the environment variables will not always be the same on > the other nodes since only the root node is attached to a terminal, so > variables in launch scripts (.bashrc etc.) may not be available on all > nodes. Many clusters that are part of the XSEDE network and use SGE for > example have scripts that wrap mpiexec to guarantee export of all > environmental variables when using MPI to avoid just this type of common > error. So like anything, you start with the most common cause of errors > and then work to the less common. Kernel bugs usually rank low on the > list :-) But I'm glad it's working for you now. > > Thanks, > Carson > > > > > > On 13-05-17 9:25 AM, "Heywood, Todd" heywood at cshl.edu>> wrote: > > >It appears that a kernel bug caused the NFS hang, at least for limlted > >scale testing (6 nodes, 192 tasks). I upgraded the kernel from > >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and > >cannot reproduce the hangs. > > > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the > >TMPDIR we are referring to is set by SGE within a job to be > >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > > > >Todd > > > > > > > > > >From: Carson Holt >>> > >Date: Wednesday, May 15, 2013 1:15 PM > >To: "Ernst, Evan" eernst at cshl.edu>> > >Cc: Todd Heywood heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >The mpi**** files should be generated in the $TMPDIR or TMP= location. > >If they are happening in the working directory, then there is a problem. > >If you are not setting TMP=, perhaps TMPDIR is not being exported when > >'mpiexec' is launched. You may have to manually specify that it needs to > >be exported to the other nodes using the mpiexec command line flags. > >OpenMPI for example does not export all environmental variables by > >default to the other nodes. > > > >Thanks, > >Carson > > > > > > > >From: Evan Ernst eernst at cshl.edu>> > >Date: Wednesday, 15 May, 2013 1:08 PM > >To: Carson Holt >>> > >Cc: "Heywood, Todd" heywood at cshl.edu>>, > >"maker-devel at yandell-lab.org maker-devel at yandell-lab.org>" > > maker-devel at yandell-lab.org>> > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > > > >Hi Carson, > > > >For these runs, -TMP is set to the $TMPDIR environment variable via maker > >command line argument in the cluster job script to use the local disk on > >each node. We can see files being generated in those locations on each > >node, so it seems this is working as expected. > > > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is > >relevant, but I'm also setting mpi_blastdb= to consolidate the databases > >onto a different, faster nfs mount than the working dir where the mpi**** > >files are being written. > > > >Thanks, > >Evan > > > > > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt > > Carson.Holt at oicr.on.ca>> wrote: > >No it does not use ROMIO. > > > >The locking may be do to how your NFS is implemented. MAKER does a lot of > >small writes. Some NFS implementations do not handle that well and only > >like large infrequent writes and frequent reads? > >MAKER also uses a variant of the File:::NFSLock module which uses > >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is > >enabled (described here > >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). > >I know that the FhGFS implementation of NFS has broken hard link > >functionality. > > > > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS > >mounted location. It must be local (/tmp for example). This is because > >certain types of operations are not always NFS safe and need a local > >location to work with (anything involving berkley DB or SQLite for > >example). Make sure you are not setting that to an NFS mounted scratch > >location. The mpi**** files, are examples of some short lived files that > >should not be in NFS. They hold chunks of data from threads that are > >processing the genome and are very rapidly created and deleted. They will > >be cleaned up automatically when maker finished or killed by standard > >signals such as when you hit ^C or use kill 15. > > > > > >Thanks, > >Carson > > > > > > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" > > >> wrote: > > > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER > >>(version 2.27). Processes go into a "D" state and cannot be killed. We > >>end up having to reboot nodes to recover them. We are running MPICH2 > >>version 1.4.1p1 > >>with RHEL 6.3. Questions: > >> > >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung > >>on a sync_page system call under NFS. That *might* imply some locking > >>issues. > >> > >>(2) Has anyone else seen this? > >> > >>(3) The root directory (parent of genome.maker.output directory) has lots > >>of mpi***** files, all of which have the first line > >>"pst0Process::MpiChunk". Is this expected? > >> > >>I'm able to reproducibly hang NFS on some nodes when using at least 4 > >>32-core nodes and 128 running MPI tasks. > >> > >>Thanks, > >> > >>Todd Heywood > >>CSHL > >> > >> > > > > > >_______________________________________________ > >maker-devel mailing list > >maker-devel at box290.bluehost.com > maker-devel at box290.bluehost.com>> > >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 20 18:38:41 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 20:38:41 -0400 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> Message-ID: It may have just been a random failure. Try launching it again. Basically one instance failed to launch hydra_pmi_proxy which wraps the command being called via mpiexec. So you get 7 lines of output instead of the 8 that should be there. --Carson On 13-05-20 8:33 PM, "Heywood, Todd" wrote: >All starter_with_limit.sh does is set a ulimit for the top process for >the job, then start it passing all parameters: > >#!/bin/sh >ulimit -c 0 >exec $* > > >From: Evan Ernst > >Date: Monday, May 20, 2013 8:20 PM >To: Carson Holt > >Cc: Carson Holt >, >"maker-devel at yandell-lab.org" >>, Todd >Heywood > >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/tmp/uge/1031236.1.primary.q >/opt/uge/default/common/starter_with_limit.sh: line 4: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": No such file or directory >/opt/uge/default/common/starter_with_limit.sh: line 4: exec: >/sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin >/hydra_pmi_proxy": cannot execute: No such file or directory > > >Todd, are these errors from the starter_with_limit.sh wrapper harmless? > >Thanks, >Evan > > >On Mon, May 20, 2013 at 7:50 PM, Carson Holt >> wrote: >Could you run the following command for me and share the ouptut with me? > >mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' > >Thanks, >Carson > > > >From: Evan Ernst >nst at cshl.edu>>> >Date: Monday, 20 May, 2013 4:36 PM >To: Carson Holt >oicr.on.ca>> >Cc: >"maker-devel at yandell-lab.orgker-devel at yandell-lab.org>" >ker-devel at yandell-lab.org>>, >"Heywood, Todd" >heywood at cshl.edu>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >The SGE launch script looks like this (sans SGE args): > >mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl >maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 > >Snooping on the running jobs (see attached image), it looks like $TMPDIR >is evaluated to a local directory by the shell of the MPI master node as >intended, so the evaluated path, not the env var reference, is being >passed to the MPI workers. > >Despite this, the mpi*** files are still being created in the working >directory. > >If I understand correctly, these mpi*** files are meant to be written to >the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), >which should be equivalent, but this doesn't seem to be the case. > >Thanks, >Evan > > > > >On Fri, May 17, 2013 at 9:40 AM, Carson Holt >oicr.on.ca>> wrote: >I'm glad your getting better results. > >With respect to environmental variables. One common error in MPI >execution is that the environment variables will not always be the same on >the other nodes since only the root node is attached to a terminal, so >variables in launch scripts (.bashrc etc.) may not be available on all >nodes. Many clusters that are part of the XSEDE network and use SGE for >example have scripts that wrap mpiexec to guarantee export of all >environmental variables when using MPI to avoid just this type of common >error. So like anything, you start with the most common cause of errors >and then work to the less common. Kernel bugs usually rank low on the >list :-) But I'm glad it's working for you now. > >Thanks, >Carson > > > > > >On 13-05-17 9:25 AM, "Heywood, Todd" >heywood at cshl.edu>>> wrote: > >>It appears that a kernel bug caused the NFS hang, at least for limlted >>scale testing (6 nodes, 192 tasks). I upgraded the kernel from >>2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >>cannot reproduce the hangs. >> >>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >>TMPDIR we are referring to is set by SGE within a job to be >>/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? >> >>Todd >> >> >> >> >>From: Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> >>Date: Wednesday, May 15, 2013 1:15 PM >>To: "Ernst, Evan" >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Cc: Todd Heywood >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>The mpi**** files should be generated in the $TMPDIR or TMP= location. >>If they are happening in the working directory, then there is a problem. >>If you are not setting TMP=, perhaps TMPDIR is not being exported when >>'mpiexec' is launched. You may have to manually specify that it needs to >>be exported to the other nodes using the mpiexec command line flags. >>OpenMPI for example does not export all environmental variables by >>default to the other nodes. >> >>Thanks, >>Carson >> >> >> >>From: Evan Ernst >>>rnst at cshl.edu>>>nst at cshl.edu>>> >>Date: Wednesday, 15 May, 2013 1:08 PM >>To: Carson Holt >>>@oicr.on.ca>>>on.holt at oicr.on.ca>>>> >>Cc: "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>>, >>"maker-devel at yandell-lab.org>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>" >>>aker-devel at yandell-lab.org>>ker-devel at yandell-lab.org>r-devel at yandell-lab.org>>> >>Subject: Re: [maker-devel] MPI MAKER hanging NFS >> >>Hi Carson, >> >>For these runs, -TMP is set to the $TMPDIR environment variable via maker >>command line argument in the cluster job script to use the local disk on >>each node. We can see files being generated in those locations on each >>node, so it seems this is working as expected. >> >>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >>relevant, but I'm also setting mpi_blastdb= to consolidate the databases >>onto a different, faster nfs mount than the working dir where the mpi**** >>files are being written. >> >>Thanks, >>Evan >> >> >> >>On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>@oicr.on.ca>>>on.Holt at oicr.on.ca>>>> wrote: >>No it does not use ROMIO. >> >>The locking may be do to how your NFS is implemented. MAKER does a lot >>of >>small writes. Some NFS implementations do not handle that well and only >>like large infrequent writes and frequent reads? >>MAKER also uses a variant of the File:::NFSLock module which uses >>hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >>enabled (described here >>http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >>I know that the FhGFS implementation of NFS has broken hard link >>functionality. >> >> >>Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >>mounted location. It must be local (/tmp for example). This is because >>certain types of operations are not always NFS safe and need a local >>location to work with (anything involving berkley DB or SQLite for >>example). Make sure you are not setting that to an NFS mounted scratch >>location. The mpi**** files, are examples of some short lived files that >>should not be in NFS. They hold chunks of data from threads that are >>processing the genome and are very rapidly created and deleted. They >>will >>be cleaned up automatically when maker finished or killed by standard >>signals such as when you hit ^C or use kill 15. >> >> >>Thanks, >>Carson >> >> >> >> >>On 13-05-14 4:42 PM, "Heywood, Todd" >>>:heywood at cshl.edu>>>to:heywood at cshl.edu>>> wrote: >> >>>We have been getting hung NFS mounts on some nodes when running MPI >>>MAKER >>>(version 2.27). Processes go into a "D" state and cannot be killed. We >>>end up having to reboot nodes to recover them. We are running MPICH2 >>>version 1.4.1p1 >>>with RHEL 6.3. Questions: >>> >>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>>on a sync_page system call under NFS. That *might* imply some locking >>>issues. >>> >>>(2) Has anyone else seen this? >>> >>>(3) The root directory (parent of genome.maker.output directory) has >>>lots >>>of mpi***** files, all of which have the first line >>>"pst0Process::MpiChunk". Is this expected? >>> >>>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>>32-core nodes and 128 running MPI tasks. >>> >>>Thanks, >>> >>>Todd Heywood >>>CSHL >>> >>> >> >> >>_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com>ailto:maker-devel at box290.bluehost.com>com>>>uehost.com>>290.bluehost.com>>> >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > >_______________________________________________ maker-devel mailing list >maker-devel at box290.bluehost.comilto:maker-devel at box290.bluehost.comm>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From heywood at cshl.edu Mon May 20 18:33:32 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:33:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130DC@ex-hs-mbx06.cshl.edu> All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters: #!/bin/sh ulimit -c 0 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From heywood at cshl.edu Mon May 20 18:34:48 2013 From: heywood at cshl.edu (Heywood, Todd) Date: Tue, 21 May 2013 00:34:48 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: <0ED760096959DE4291A3550A46EC4685A8A130FB@ex-hs-mbx06.cshl.edu> Actually, line 4 is the exec (one line is commented out): #!/bin/sh ulimit -c 0 #ulimit -n 262144 exec $* From: Evan Ernst > Date: Monday, May 20, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, Todd Heywood > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From Carson.Holt at oicr.on.ca Mon May 20 18:48:32 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Tue, 21 May 2013 00:48:32 +0000 Subject: [maker-devel] MPI MAKER hanging NFS In-Reply-To: Message-ID: Could you use the attached file to replace maker/src/bin/maker and maker/bin/maker? You will have to rerun 'maker/src/Build install' or just edit the shebang line (#!) if perl is located anywhere other than /usr/bin/perl. I explicitly tell it to use the system TMPDIR rather than letting it get set implicitly. See if that stops the mpi***** files in the working directory. It's always possible that this is just a slight difference in behavior for the version of the File::Temp module that is packaged with your perl. --Carson From: Evan Ernst > Date: Monday, 20 May, 2013 8:20 PM To: Carson Holt > Cc: Carson Holt >, "maker-devel at yandell-lab.org" >, "Heywood, Todd" > Subject: Re: [maker-devel] MPI MAKER hanging NFS /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /tmp/uge/1031236.1.primary.q /opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory /opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory Todd, are these errors from the starter_with_limit.sh wrapper harmless? Thanks, Evan On Mon, May 20, 2013 at 7:50 PM, Carson Holt > wrote: Could you run the following command for me and share the ouptut with me? mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"' Thanks, Carson From: Evan Ernst >> Date: Monday, 20 May, 2013 4:36 PM To: Carson Holt >> Cc: "maker-devel at yandell-lab.org>" >>, "Heywood, Todd" >> Subject: Re: [maker-devel] MPI MAKER hanging NFS Hi Carson, The SGE launch script looks like this (sans SGE args): mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1 Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. Despite this, the mpi*** files are still being created in the working directory. If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case. Thanks, Evan On Fri, May 17, 2013 at 9:40 AM, Carson Holt >> wrote: I'm glad your getting better results. With respect to environmental variables. One common error in MPI execution is that the environment variables will not always be the same on the other nodes since only the root node is attached to a terminal, so variables in launch scripts (.bashrc etc.) may not be available on all nodes. Many clusters that are part of the XSEDE network and use SGE for example have scripts that wrap mpiexec to guarantee export of all environmental variables when using MPI to avoid just this type of common error. So like anything, you start with the most common cause of errors and then work to the less common. Kernel bugs usually rank low on the list :-) But I'm glad it's working for you now. Thanks, Carson On 13-05-17 9:25 AM, "Heywood, Todd" >> wrote: >It appears that a kernel bug caused the NFS hang, at least for limlted >scale testing (6 nodes, 192 tasks). I upgraded the kernel from >2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and >cannot reproduce the hangs. > >As far a TMPDIR, I'm not really sure I understand. We use SGE, and the >TMPDIR we are referring to is set by SGE within a job to be >/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE? > >Todd > > > > >From: Carson Holt >>>> >Date: Wednesday, May 15, 2013 1:15 PM >To: "Ernst, Evan" >>>> >Cc: Todd Heywood >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >The mpi**** files should be generated in the $TMPDIR or TMP= location. >If they are happening in the working directory, then there is a problem. >If you are not setting TMP=, perhaps TMPDIR is not being exported when >'mpiexec' is launched. You may have to manually specify that it needs to >be exported to the other nodes using the mpiexec command line flags. >OpenMPI for example does not export all environmental variables by >default to the other nodes. > >Thanks, >Carson > > > >From: Evan Ernst >>>> >Date: Wednesday, 15 May, 2013 1:08 PM >To: Carson Holt >>>> >Cc: "Heywood, Todd" >>>>, >"maker-devel at yandell-lab.org>>>" >>>>> >Subject: Re: [maker-devel] MPI MAKER hanging NFS > >Hi Carson, > >For these runs, -TMP is set to the $TMPDIR environment variable via maker >command line argument in the cluster job script to use the local disk on >each node. We can see files being generated in those locations on each >node, so it seems this is working as expected. > >In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is >relevant, but I'm also setting mpi_blastdb= to consolidate the databases >onto a different, faster nfs mount than the working dir where the mpi**** >files are being written. > >Thanks, >Evan > > > >On Tue, May 14, 2013 at 9:01 PM, Carson Holt >>>>> wrote: >No it does not use ROMIO. > >The locking may be do to how your NFS is implemented. MAKER does a lot of >small writes. Some NFS implementations do not handle that well and only >like large infrequent writes and frequent reads? >MAKER also uses a variant of the File:::NFSLock module which uses >hardlinks to force a flush of the NFS IO cache when asyncrynous IO is >enabled (described here >http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html). >I know that the FhGFS implementation of NFS has broken hard link >functionality. > > >Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS >mounted location. It must be local (/tmp for example). This is because >certain types of operations are not always NFS safe and need a local >location to work with (anything involving berkley DB or SQLite for >example). Make sure you are not setting that to an NFS mounted scratch >location. The mpi**** files, are examples of some short lived files that >should not be in NFS. They hold chunks of data from threads that are >processing the genome and are very rapidly created and deleted. They will >be cleaned up automatically when maker finished or killed by standard >signals such as when you hit ^C or use kill 15. > > >Thanks, >Carson > > > > >On 13-05-14 4:42 PM, "Heywood, Todd" >>>>> wrote: > >>We have been getting hung NFS mounts on some nodes when running MPI MAKER >>(version 2.27). Processes go into a "D" state and cannot be killed. We >>end up having to reboot nodes to recover them. We are running MPICH2 >>version 1.4.1p1 >>with RHEL 6.3. Questions: >> >>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung >>on a sync_page system call under NFS. That *might* imply some locking >>issues. >> >>(2) Has anyone else seen this? >> >>(3) The root directory (parent of genome.maker.output directory) has lots >>of mpi***** files, all of which have the first line >>"pst0Process::MpiChunk". Is this expected? >> >>I'm able to reproducibly hang NFS on some nodes when using at least 4 >>32-core nodes and 128 running MPI tasks. >> >>Thanks, >> >>Todd Heywood >>CSHL >> >> > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com>>> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker Type: application/octet-stream Size: 49266 bytes Desc: maker URL: From carsonhh at gmail.com Mon May 20 19:08:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 20 May 2013 21:08:51 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: On default settings MAKER will only put ab initio predictions that have some sort of evidence support (EST or protein) in the final gene set. The rejected predictions are still in the GFF3 for reference purposes as match/match_part features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might be why it is not there. You can add all rejected models that don't overlap an accepted model by setting keep_preds=1 (this usually brings a lot more into the final gene set than you really want though (lots of false positives). But for some organisms like fungi, which have high gene densities, this approach is relatively safe. Alternatively the gene is missing because it overlaps another gene model that was accepted. MAKER won't allow overlapping models on the same strand in eukaryotes. The only way to force that kind of overlap is to give MAKER the reference models in model_gff and not let it call it's own models (then maker is really just aligning evidence and scoring the reference models). One final note. If there is no evidence supporting the model, and that is why it is rejected, you can also try adding more evidence to the maker run or you can consider the possibility that the gene model in the reference is not real to being with (i.e. a false positive gene model called during the initial annotation process and not supported by protein or expression data from any source). Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 12:16 AM To: Subject: [maker-devel] Why are some complete gene predictions not present in the final results? Hi MAKER developers, I was exploiting MAKER to perform gene prediction and annotation on my contigs. I used Artemis to examine gff and found some CDS with complete structure were absent in the final results. They are really predicted and annotated on the ref genome. I'm wondering if they were discarded due to overlapping with another CDS. How can I preserve these CDS? Thanks a lot in advance. Hung-Wei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ares711122 at gmail.com Mon May 20 19:19:20 2013 From: ares711122 at gmail.com (Hung-Wei Hsu) Date: Tue, 21 May 2013 09:19:20 +0800 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: References: Message-ID: Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have > some sort of evidence support (EST or protein) in the final gene set. The > rejected predictions are still in the GFF3 for reference purposes as > match/match_part features, but not as gene/mRNA/exon/CDS features. So a > lack of evidence might be why it is not there. You can add all rejected > models that don't overlap an accepted model by setting keep_preds=1 (this > usually brings a lot more into the final gene set than you really want > though (lots of false positives). But for some organisms like fungi, which > have high gene densities, this approach is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model > that was accepted. MAKER won't allow overlapping models on the same strand > in eukaryotes. The only way to force that kind of overlap is to give MAKER > the reference models in model_gff and not let it call it's own models (then > maker is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is > why it is rejected, you can also try adding more evidence to the maker run > or you can consider the possibility that the gene model in the reference is > not real to being with (i.e. a false positive gene model called during the > initial annotation process and not supported by protein or expression data > from any source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present > in the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure > were absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Mon May 20 23:57:19 2013 From: rob.syme at gmail.com (Rob Syme) Date: Tue, 21 May 2013 13:57:19 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column Message-ID: Hi all By my reading of the GFF3 spec ( http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 01:36:37 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Tue, 21 May 2013 07:36:37 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.moore at genetics.utah.edu Tue May 21 17:54:40 2013 From: barry.moore at genetics.utah.edu (Barry Moore) Date: Tue, 21 May 2013 17:54:40 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Tue May 21 09:58:43 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Tue, 21 May 2013 10:58:43 -0500 Subject: [maker-devel] Maker: Re-annotation Message-ID: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 18:59:09 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 20:59:09 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. > Maker has been chosen as the preliminary tool for the task. By checking the > annotation results by using maker 2.10, we saw some loci have the fusion > problem: two separate neighbour genes are likely to be fused together and > regarded as a single candidate output by maker. If we go further by looking at > the outputs from each individual de novo algorithm, e.g. augustus or snap, the > prediction was correct. We are also using RNA-Seq assembly from cufflinks and > some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and > ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when > to merge models or not. If possible, can you please explain what these two > parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to > predict UTRs, enlarge the protein evidence datasets and input our previous > annotations as model_gff. We are facing with an critical question: in which > way we could effectively improve the gene fusing problem? 1) setting the > pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything > else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 19:23:48 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 01:23:48 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: <8A1FF7BA-AC70-44A7-8C25-5DA130BC9360@genetics.utah.edu> Message-ID: Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:37:02 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:37:02 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 19:39:01 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 21:39:01 -0400 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: Message-ID: One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt , Barry Moore Cc: Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore Date: Tuesday, 21 May, 2013 7:54 PM To: Cc: Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.Li at csiro.au Tue May 21 20:23:26 2013 From: Sean.Li at csiro.au (Sean.Li at csiro.au) Date: Wed, 22 May 2013 02:23:26 +0000 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: Thank you Carson. It has been a very helpful conversation with you! I will pass these information back to our group. Best regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 11:39 AM To: Li, Sean (CMIS, Acton); barry.moore at genetics.utah.edu Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? One more time, but I fixed a few obvious spelling errors --> 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? No. Trinity would probably be a better approach to avoid merging. 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expanding the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries). --Carson From: > Date: Tuesday, 21 May, 2013 9:23 PM To: Carson Holt >, Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Thanks Barry and Carson for your detailed explanation. Now I have a better understand of "pred_flank". 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? 2. If my understanding is correct, the "correct_est_fusion" parameter needs to be turned off when we don't ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? Regards, Sean From: Carson Holt [mailto:carsonhh at gmail.com] Sent: Wednesday, 22 May 2013 10:59 AM To: Barry Moore; Li, Sean (CMIS, Acton) Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). Thanks, Carson From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? Hi Sean, I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. B On May 21, 2013, at 1:36 AM, > wrote: Hi Carson, We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. We noticed that the parameters "pred_flank" in maker v2.10 and "correct_est_fusion" in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? Thank you. With best regards, Xi (Sean) Li, Ph. D. Bioinformatics Analyst, Bioinformatics Core, CSIRO Mathematics, Informatics and Statistics Phone: +61 2 6216 7138 Address: GPO Box 664, Canberra, ACT 2601 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:28:46 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:28:46 -0400 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: The option in trinity is --jaccard_clip --> http://trinityrnaseq.sourceforge.net/#jaccard_clip --Carson From: Innocent Onsongo Date: Tuesday, 21 May, 2013 11:58 AM To: Subject: [maker-devel] Maker: Re-annotation Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 21 20:32:54 2013 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 21 May 2013 22:32:54 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: It looks like the phase was calculated from the wrong strand orientation. I believe I have corrected this now. I'm checking a few more things, but I'll have 2.28 as the latest release likely tomorrow with the cumulative bug fixes since the last release. Thanks, Carson From: Rob Syme Date: Tuesday, 21 May, 2013 1:57 AM To: Subject: [maker-devel] Maker-derived CDS GFF3 phase column Hi all By my reading of the GFF3 spec (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker that has odd data in the phase column. For example, see some example Maker output at https://gist.github.com/robsyme/5617399 There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, respectively. Both exons are in the reverse strand. >From the spec, phase indicates "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon", and for "reverse strand features, phase is counted from the end field". In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is 5953. The base at the end field is the first base of the translated CDS, so there should be no bases removed "to reach the first base of the next codon". I suggest that this phase should be 0, not 2. There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. The output gff3 is correct if "the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon" is measured from the 'left-hand' end of this feature (the start field) rather than the end field. Has anybody else ran into this problem or am I misreading the gff3 spec? Rob Syme PhD Student Curtin University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:37:30 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:37:30 -0600 Subject: [maker-devel] Fused gene problem, improvement in the Maker 2.27? In-Reply-To: References: Message-ID: <2024BE21-4293-4E9D-BE13-92774C7BC96D@gmail.com> Sean, The Trinity option to manage fusion transcripts is --jaccard_clip and is described here: http://trinityrnaseq.sourceforge.net/#jaccard_clip Trinity has also added functionality to use a hybrid reference-guided/de-novo assembly approach which you might also consider: http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html B On May 21, 2013, at 7:37 PM, Carson Holt wrote: > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > > No. Trinity would probably be a better approach to avoid merging. > > > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > MAKER will always try to add UTR if the EST evidence suggests it. Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene. The long UTRs that can result from mRNA-seq are often false. You are basically expending the UTR by assembling into exons from the neighboring gene. This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries). > > --Carson > > > > > > From: > Date: Tuesday, 21 May, 2013 9:23 PM > To: Carson Holt , Barry Moore > Cc: > Subject: RE: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Thanks Barry and Carson for your detailed explanation. Now I have a better understand of ?pred_flank?. > > 1. To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads? > 2. If my understanding is correct, the ?correct_est_fusion? parameter needs to be turned off when we don?t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model? > > Regards, > Sean > > From: Carson Holt [mailto:carsonhh at gmail.com] > Sent: Wednesday, 22 May 2013 10:59 AM > To: Barry Moore; Li, Sean (CMIS, Acton) > Cc: maker-devel at yandell-lab.org > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Yes. Barry gave a good overview. The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models). Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases. Of course avoiding merging the mRNA-seq reads in the first place also works. So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go. > > I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well). > > Thanks, > Carson > > > From: Barry Moore > Date: Tuesday, 21 May, 2013 7:54 PM > To: > Cc: > Subject: Re: [maker-devel] Fused gene problem, improvement in the Maker 2.27? > > Hi Sean, > > I think you want to be careful with dropping the pred_flank parameter too low. This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor. Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state. The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion. > > Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well. > > B > > On May 21, 2013, at 1:36 AM, > wrote: > > > Hi Carson, > > We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct. We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects. > > We noticed that the parameters ?pred_flank? in maker v2.10 and ?correct_est_fusion? in maker v2.27 might be useful for maker to decide when to merge models or not. If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence? > > Also, our current plan is to install maker 2.27, train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else? > > Thank you. > > With best regards, > Xi (Sean) Li, Ph. D. > > Bioinformatics Analyst, Bioinformatics Core, > CSIRO Mathematics, Informatics and Statistics > Phone: +61 2 6216 7138 > Address: GPO Box 664, Canberra, ACT 2601 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Tue May 21 21:43:47 2013 From: barry.utah at gmail.com (Barry Moore) Date: Tue, 21 May 2013 21:43:47 -0600 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Tue May 21 22:04:04 2013 From: rob.syme at gmail.com (Rob Syme) Date: Wed, 22 May 2013 12:04:04 +0800 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: References: Message-ID: Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. > I believe I have corrected this now. I'm checking a few more things, but > I'll have 2.28 as the latest release likely tomorrow with the cumulative > bug fixes since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec ( > http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from > Maker that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon", and for "reverse strand features, phase is counted from the end > field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) > is 5953. > The base at the end field is the first base of the translated CDS, so > there should be no bases removed "to reach the first base of the next > codon". I suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next > codon" is measured from the 'left-hand' end of this feature (the start > field) rather than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:50:26 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:50:26 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: One other thing, I ran MAKER with the RM_off flag (maker -f -RM_off -q) the input sequences had already been masked. On Wed, May 22, 2013 at 7:47 AM, Innocent Onsongo wrote: > No. The MAKER produced GFF3 file does not contain any annotations. I even > tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it > will pass annotations from the Augustus produced GFF file into the final > annotation but that didn't work. I have attached the maker_opts.ctl file > I used together with the first 100 lines of the GFF files it's using. I > also include the GFF file produced by MAKER (CGS01058First100.gff) > > > > > On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > >> Hi Getiria, >> >> Does the MAKER produced GFF3 file contain any annotations at all? Can >> you send the first ~100 lines each of the MAKER produced GFF3 file and of >> the GFF3 files that you passed via maker_opts.ctl? >> >> B >> >> On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: >> >> Maker Development Team, >> >> I am trying to use Maker for re-annotation using gene predictions from >> Augustus. We had previously used Augustus for gene prediction but now want >> to combine these annotations with some EST data. I updated >> fields maker_opts.ctl as below >> >> genome=CGS01058.fasta #genome sequence file in fasta format >> est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT >> pred_gff=Augustus.gff3 #ab-initio predictions from >> other_gff=Promoters.gff3 #promoter annotations >> other_gff=CpG_Islands.gff3 # CpG island annotations >> >> Maker runs to completion and according to the log file annotation was >> successful. However, it also gives a "Segmentation fault (core dumped)" >> message. It does produce a GFF3 file but when I load the GFF3 file into IGV >> and look it does not contain any of the exon definitions in Augustus.gff3. >> Am I missing something? >> >> Regards, >> Getiria >> >> -- >> Getiria Onsongo, Ph.D. >> Informatics Analyst, Research Informatics Support System >> Minnesota Supercomputing Institute for Advanced Computational Research >> University of Minnesota >> Minneapolis, MN 55455 >> Phone: 612-624-0532 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> Barry Moore >> Research Scientist >> Dept. of Human Genetics >> University of Utah >> Salt Lake City, UT 84112 >> -------------------------------------------- >> (801) 585-3543 >> >> >> >> >> > > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From onson001 at umn.edu Wed May 22 06:47:30 2013 From: onson001 at umn.edu (Innocent Onsongo) Date: Wed, 22 May 2013 07:47:30 -0500 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: References: Message-ID: No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore wrote: > Hi Getiria, > > Does the MAKER produced GFF3 file contain any annotations at all? Can you > send the first ~100 lines each of the MAKER produced GFF3 file and of the > GFF3 files that you passed via maker_opts.ctl? > > B > > On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: > > Maker Development Team, > > I am trying to use Maker for re-annotation using gene predictions from > Augustus. We had previously used Augustus for gene prediction but now want > to combine these annotations with some EST data. I updated > fields maker_opts.ctl as below > > genome=CGS01058.fasta #genome sequence file in fasta format > est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT > pred_gff=Augustus.gff3 #ab-initio predictions from > other_gff=Promoters.gff3 #promoter annotations > other_gff=CpG_Islands.gff3 # CpG island annotations > > Maker runs to completion and according to the log file annotation was > successful. However, it also gives a "Segmentation fault (core dumped)" > message. It does produce a GFF3 file but when I load the GFF3 file into IGV > and look it does not contain any of the exon definitions in Augustus.gff3. > Am I missing something? > > Regards, > Getiria > > -- > Getiria Onsongo, Ph.D. > Informatics Analyst, Research Informatics Support System > Minnesota Supercomputing Institute for Advanced Computational Research > University of Minnesota > Minneapolis, MN 55455 > Phone: 612-624-0532 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: AugustusFirst100.gff3 Type: application/octet-stream Size: 9703 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CGS01058First100.gff Type: application/octet-stream Size: 5665 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: CpG_IslandsFirst100.gff3 Type: application/octet-stream Size: 1964 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EST2ScaffoldFirst100.gff3 Type: application/octet-stream Size: 9901 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PromotersFirst100.gff3 Type: application/octet-stream Size: 113 bytes Desc: not available URL: From Carson.Holt at oicr.on.ca Wed May 22 08:03:14 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Wed, 22 May 2013 14:03:14 +0000 Subject: [maker-devel] Maker: Re-annotation In-Reply-To: Message-ID: Are you using MAKER version 2.10? I ask because there is in issue with other_gff in that version that has since been fixed. So if you don't get other_gff to pass-through, you will need to upgrade to 2.28 (release date is later today coincidentally). For the Augustus GFF3 file, the format is a little weird which is causing the problem. They are mRNA features not attached to genes. Rather than build the expected 3 level gene/mRNA/exon structure for these, it is simpler just to convert it to the 2 level match/match_part structure. Just convert the 'mRNA' tag to 'match' and all 'exon' tags to 'match_part'. Rename the GFF3 when your done so that it will force rebuild of the GFF3 database when you run again. Thanks, Carson From: Innocent Onsongo > Date: Wednesday, 22 May, 2013 8:47 AM To: Barry Moore > Cc: > Subject: Re: [maker-devel] Maker: Re-annotation No. The MAKER produced GFF3 file does not contain any annotations. I even tried setting the keep_preds parameter to 1 (keep_preds=1) to see if it will pass annotations from the Augustus produced GFF file into the final annotation but that didn't work. I have attached the maker_opts.ctl file I used together with the first 100 lines of the GFF files it's using. I also include the GFF file produced by MAKER (CGS01058First100.gff) On Tue, May 21, 2013 at 10:43 PM, Barry Moore > wrote: Hi Getiria, Does the MAKER produced GFF3 file contain any annotations at all? Can you send the first ~100 lines each of the MAKER produced GFF3 file and of the GFF3 files that you passed via maker_opts.ctl? B On May 21, 2013, at 9:58 AM, Innocent Onsongo wrote: Maker Development Team, I am trying to use Maker for re-annotation using gene predictions from Augustus. We had previously used Augustus for gene prediction but now want to combine these annotations with some EST data. I updated fields maker_opts.ctl as below genome=CGS01058.fasta #genome sequence file in fasta format est_gff=EST2Scaffold.gff3 # ESTs mapped to CGS01058.fasta using BLAT pred_gff=Augustus.gff3 #ab-initio predictions from other_gff=Promoters.gff3 #promoter annotations other_gff=CpG_Islands.gff3 # CpG island annotations Maker runs to completion and according to the log file annotation was successful. However, it also gives a "Segmentation fault (core dumped)" message. It does produce a GFF3 file but when I load the GFF3 file into IGV and look it does not contain any of the exon definitions in Augustus.gff3. Am I missing something? Regards, Getiria -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -- Getiria Onsongo, Ph.D. Informatics Analyst, Research Informatics Support System Minnesota Supercomputing Institute for Advanced Computational Research University of Minnesota Minneapolis, MN 55455 Phone: 612-624-0532 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:38:50 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:38:50 -0400 Subject: [maker-devel] Why are some complete gene predictions not present in the final results? In-Reply-To: Message-ID: I've released 2.28 on the website. This is one of the bugs that was fixed. It happens under a very specific set of circumstances. You need to run maker with the -a command line flag to get it to recalculate upstream variables after upgrading. Alternatively you can also just give maker your old GFF3 file (make all other options blank exempt for the *_pass= options), and maker will just rebuild it. Thanks, Carson From: Hung-Wei Hsu Date: Monday, 20 May, 2013 9:19 PM To: Carson Holt Cc: Subject: Re: [maker-devel] Why are some complete gene predictions not present in the final results? Thanks a lot for your helps. Your suggestions will be greatly helpful for our analysis. I've tried to add EST sequences to improve gene predictions. The EST sequences I used were CDS sequences of the same organism. But I got an error as below. substr outside of string at .../TranslationMachine.pm line 162 ERROR: Failed while polishig ESTs ERROR: Chunk failed at level:2, tier_type:3 What's wrong with my analysis? The EST sequences I used are wrong? Thank you. Hung-Wei 2013/5/21 Carson Holt > On default settings MAKER will only put ab initio predictions that have some > sort of evidence support (EST or protein) in the final gene set. The rejected > predictions are still in the GFF3 for reference purposes as match/match_part > features, but not as gene/mRNA/exon/CDS features. So a lack of evidence might > be why it is not there. You can add all rejected models that don't overlap an > accepted model by setting keep_preds=1 (this usually brings a lot more into > the final gene set than you really want though (lots of false positives). But > for some organisms like fungi, which have high gene densities, this approach > is relatively safe. > > Alternatively the gene is missing because it overlaps another gene model that > was accepted. MAKER won't allow overlapping models on the same strand in > eukaryotes. The only way to force that kind of overlap is to give MAKER the > reference models in model_gff and not let it call it's own models (then maker > is really just aligning evidence and scoring the reference models). > > One final note. If there is no evidence supporting the model, and that is why > it is rejected, you can also try adding more evidence to the maker run or you > can consider the possibility that the gene model in the reference is not real > to being with (i.e. a false positive gene model called during the initial > annotation process and not supported by protein or expression data from any > source). > > Thanks, > Carson > > > > From: Hung-Wei Hsu > Date: Monday, 20 May, 2013 12:16 AM > To: > Subject: [maker-devel] Why are some complete gene predictions not present in > the final results? > > Hi MAKER developers, > > I was exploiting MAKER to perform gene prediction and annotation on my > contigs. > I used Artemis to examine gff and found some CDS with complete structure were > absent in the final results. > They are really predicted and annotated on the ref genome. > I'm wondering if they were discarded due to overlapping with another CDS. > How can I preserve these CDS? > Thanks a lot in advance. > > Hung-Wei > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 22 10:39:53 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 22 May 2013 12:39:53 -0400 Subject: [maker-devel] Maker-derived CDS GFF3 phase column In-Reply-To: Message-ID: Ok. It's available for download. --Carson From: Rob Syme Date: Wednesday, 22 May, 2013 12:04 AM To: Carson Holt Cc: Subject: Re: [maker-devel] Maker-derived CDS GFF3 phase column Fantastic. I thought that might have been the problem. Looking forward to 2.28. Thanks! Rob On Wed, May 22, 2013 at 10:32 AM, Carson Holt wrote: > It looks like the phase was calculated from the wrong strand orientation. I > believe I have corrected this now. I'm checking a few more things, but I'll > have 2.28 as the latest release likely tomorrow with the cumulative bug fixes > since the last release. > > Thanks, > Carson > > > > From: Rob Syme > Date: Tuesday, 21 May, 2013 1:57 AM > To: > Subject: [maker-devel] Maker-derived CDS GFF3 phase column > > Hi all > > By my reading of the GFF3 spec > (http://sequenceontology.org/resources/gff3.html), I'm getting gff3 from Maker > that has odd data in the phase column. > > For example, see some example Maker output at > https://gist.github.com/robsyme/5617399 > > There are two exons, 5617 <- 5737 and 5793 <- 5953 with phases 0 and 2, > respectively. Both exons are in the reverse strand. > > From the spec, phase indicates "the number of bases that should be removed > from the beginning of this feature to reach the first base of the next codon", > and for "reverse strand features, phase is counted from the end field". > > In the case of the 3' exon (5793 <- 5953), the end field (the 5th column) is > 5953. > The base at the end field is the first base of the translated CDS, so there > should be no bases removed "to reach the first base of the next codon". I > suggest that this phase should be 0, not 2. > > There is an illustration of the feature at http://i.imgur.com/DKLxnSf.png. > > The output gff3 is correct if "the number of bases that should be removed from > the beginning of this feature to reach the first base of the next codon" is > measured from the 'left-hand' end of this feature (the start field) rather > than the end field. > > Has anybody else ran into this problem or am I misreading the gff3 spec? > > Rob Syme > PhD Student > Curtin University > > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry.utah at gmail.com Thu May 23 10:40:23 2013 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 23 May 2013 10:40:23 -0600 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch>, <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> Message-ID: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Hi Liciano, If I understand correctly you are including translations of SNAP and Augustus predictions as well as the predictions. If so, you don't want to do that. An overlapping protein evidence is sufficient to promote a prediction to an annotation, so by providing the protein translation of the prediction along with the prediction you will guarantee that every prediction will become an annotation and that means you lose the benefit of evidence supervised annotation that MAKER provides. Include the proteins from the D mel reference and if you want to cast a broader net include proteins from other dipterans or even Uniprot - just depend on how aggressive you want to try to be in capturing new annotations. B On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > Thanks for your reply! > > One more question, can you think of any tips to get the best possible predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be real and don't exist if I blast them, plus a few others which don't start with Methionine... So far I am including transcripts and translations from flybase, and snap and augustus with their available trainings for flies. Do you see any possible source of error in that? > > Thanks again, > > Luciano > > De: Barry Moore [barry.moore at genetics.utah.edu] > Enviado el: viernes, 17 de mayo de 2013 09:02 p.m. > Para: Luciano Abriata > Cc: maker-devel at yandell-lab.org > Asunto: Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > >> Hello, I am trying to use Maker to annotate genomes from different individuals of a population (D. melanogaster flies). >> >> My ultimate goal is to get, for each gene, the amino acid sequences of the coded proteins as they are expressed from each genome. My questions are: >> >> 1) How can I match proteins predicted for the same gene in two genomes? > > blastp tweaked with parameters to optimize near perfect match > >> >> 2) What is the meaning of all the data in a line such as the following one (taken from the protein.fasta output) >> >> maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 >> > > AED = Annotation edit distance describes how closely the prediction matches the evidence. This is a distance measure and thus 0 is a perfect match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as AED with a couple of exceptions. For a protein coding exon to be counted as overlapping protein evidence the reading frame must be the same in the coding exon and the protein evidence. Second, when mRNA Seq data is used as evidence and both ends of an exon are supported with splice site spanning reads, the middle of that exon is counted as supported as well even if coverage drops off in the interior of the exon.. For the most part AED and eAED will always be the same, but eAED tends to work better on many fringe cases. > > QI values are as follows: > > 5' UTR Length > Fraction of splice sites confirmed by EST alignment. > Fraction of exons that overlap and EST alignment. > Fraction of exons that overlap EST or protein alignment. > Fraction of splice sites confirmed by an ab initio prediction. > Fraction of exons that overlap an ab intitio prediction. > Number of exons in the transcript. > 3' UTR length. > Length of encoded protein. > > >> 3) If I include snap and augustus to improve protein predictions, I get several protein.fasta files: augustus_masked.proteins.fasta , snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and proteins.fasta >> >> Which of these files contains the definite set of predicted protein sequences? > > The proteins.fasta file is the final set of proteins for all genes that MAKER created annotations for. > >> >> >> >> Thanks in advance! >> >> Luciano >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsth at ebi.ac.uk Thu May 23 10:48:05 2013 From: dsth at ebi.ac.uk (Daniel Hughes) Date: Thu, 23 May 2013 17:48:05 +0100 Subject: [maker-devel] getting protein sequences from genomes In-Reply-To: <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> References: <18790D2A402432409BCC7E00F2AE8926ACE666@rexma.intranet.epfl.ch> <18790D2A402432409BCC7E00F2AE8926AD4807@REXMF.intranet.epfl.ch> <98C45AF6-8F3E-4C06-B283-56AD9C07DD2C@genetics.utah.edu> Message-ID: would gene annotation by projection using synteny/WGA not be more appropriate? either way what's wrong with running one of the standard orthology predictions tools or just basic best reciprocal blast? dan. Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) ------------------------------------------------------------------------------------- dsth at cantab.net dsth at cpan.org 2013/5/23 Barry Moore > Hi Liciano, > > If I understand correctly you are including translations of SNAP and > Augustus predictions as well as the predictions. If so, you don't want to > do that. An overlapping protein evidence is sufficient to promote a > prediction to an annotation, so by providing the protein translation of the > prediction along with the prediction you will guarantee that every > prediction will become an annotation and that means you lose the benefit of > evidence supervised annotation that MAKER provides. Include the proteins > from the D mel reference and if you want to cast a broader net include > proteins from other dipterans or even Uniprot - just depend on how > aggressive you want to try to be in capturing new annotations. > > B > > On May 23, 2013, at 8:41 AM, Luciano Abriata wrote: > > Thanks for your reply! > > One more question, can you think of any tips to get the best possible > predictions of protein sequences? > > I am asking because I am getting a few proteins that are too big to be > real and don't exist if I blast them, plus a few others which don't start > with Methionine... So far I am including transcripts and translations from > flybase, and snap and augustus with their available trainings for flies. Do > you see any possible source of error in that? > > Thanks again, > > Luciano > > ------------------------------ > *De:* Barry Moore [barry.moore at genetics.utah.edu] > *Enviado el:* viernes, 17 de mayo de 2013 09:02 p.m. > *Para:* Luciano Abriata > *Cc:* maker-devel at yandell-lab.org > *Asunto:* Re: [maker-devel] getting protein sequences from genomes > > > On May 17, 2013, at 3:45 AM, Luciano Abriata wrote: > > Hello, I am trying to use Maker to annotate genomes from different > individuals of a population (D. melanogaster flies). > > My ultimate goal is to get, for each gene, the amino acid sequences of the > coded proteins as they are expressed from each genome. My questions are: > > 1) How can I match proteins predicted for the same gene in two genomes? > > > blastp tweaked with parameters to optimize near perfect match > > > 2) What is the meaning of all the data in a line such as the following one > (taken from the protein.fasta output) > > maker-2L-augustus-gene-0.19-mRNA-1 protein AED:0.0322873164323667 > eAED:0.0322873164323667 QI:2|1|0.66|1|1|1|3|208|541 > > > AED = Annotation edit distance describes how closely the prediction > matches the evidence. This is a distance measure and thus 0 is a perfect > match and 1 is no overlap. > > eAED = Exon adjusted annotation edit distance: This metric is the same as > AED with a couple of exceptions. For a protein coding exon to be counted > as overlapping protein evidence the reading frame must be the same in the > coding exon and the protein evidence. Second, when mRNA Seq data is used > as evidence and both ends of an exon are supported with splice site > spanning reads, the middle of that exon is counted as supported as well > even if coverage drops off in the interior of the exon.. For the most part > AED and eAED will always be the same, but eAED tends to work better on many > fringe cases. > > QI values are as follows: > > > 1. 5' UTR Length > 2. Fraction of splice sites confirmed by EST alignment. > 3. Fraction of exons that overlap and EST alignment. > 4. Fraction of exons that overlap EST or protein alignment. > 5. Fraction of splice sites confirmed by an ab initio prediction. > 6. Fraction of exons that overlap an ab intitio prediction. > 7. Number of exons in the transcript. > 8. 3' UTR length. > 9. Length of encoded protein. > > > > 3) If I include snap and augustus to improve protein predictions, I get > several protein.fasta files: augustus_masked.proteins.fasta , > snap_masked.proteins.fasta , non_overlapping_ab_initio.proteins.fasta , and > proteins.fasta > > Which of these files contains the definite set of predicted protein > sequences? > > > The proteins.fasta file is the final set of proteins for all genes that > MAKER created annotations for. > > > > > Thanks in advance! > > Luciano > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > > Barry Moore > Research Scientist > Dept. of Human Genetics > University of Utah > Salt Lake City, UT 84112 > -------------------------------------------- > (801) 585-3543 > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bob_Freeman at hms.harvard.edu Thu May 23 14:17:00 2013 From: Bob_Freeman at hms.harvard.edu (Freeman, Robert M.) Date: Thu, 23 May 2013 16:17:00 -0400 Subject: [maker-devel] Advice on params for ciliates Message-ID: <9D9882BB-3A26-45D6-A5B0-9B18F9BF5C31@hms.harvard.edu> Dear MAKER community, Am embarking on updating models for a ciliate (taxa Ciliophora) and was wondering if folks had recommendations for MAKER parameters. Thanks, Bob ----------------------------------------------------- Bob Freeman, Ph.D. Acorn Worm Informatics, Kirschner lab Dept of Systems Biology, Alpert 524 Harvard Medical School 200 Longwood Avenue Boston, MA 02115 617/432.2294, vox "Sorry I'm late. Oh, God, that sounded insincere. I'm late." -- Karen Walker, from Will and Grace -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Fri May 24 07:10:15 2013 From: daniel.standage at gmail.com (Daniel Standage) Date: Fri, 24 May 2013 09:10:15 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments Message-ID: Greetings! I have some precomputed transcript and protein alignments that I would like to use with Maker. I have converted them into GFF3 format (see attached examples) and provided them to their corresponding entries (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. Unfortunately, Maker seems to be getting caught up on processing these GFF3 files. I've tried running Maker 2.10 as well as the development version (checked out a few months ago--svn server isn't responding so I can't give a precise revision number), and in both cases Maker hangs while trying to create the GFF3 database. These are the last lines I see in STDERR when * --debug* is set. STATUS: Setting up database for any GFF3 input... Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 587. I can't find any documentation specifying any explicit requirements for the alignment-containing GFF3 input files. Maker output uses the pretty canonical *expressed_sequence_match*, *protein_match*, and *match_part*features for encoding alignments, and I have used this convention with my input (see attached examples). I have also double-checked that my examples are valid GFF3, so my guess is that Maker has additional constraints/expectations for certain fields in the GFF3 files (score column? required attributes?). Is this correct, and if so would you be able to point me toward any related documentation I may have missed? Many thanks. -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prot-example.gff3 Type: application/octet-stream Size: 1080 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: trans-example.gff3 Type: application/octet-stream Size: 1306 bytes Desc: not available URL: From guoyunfei1989 at gmail.com Fri May 24 10:15:19 2013 From: guoyunfei1989 at gmail.com (Yunfei Guo) Date: Fri, 24 May 2013 09:15:19 -0700 Subject: [maker-devel] ./FINISHED/FINISHED.gff Message-ID: Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGGTGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTAACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip...) maker 2.27 Thanks, Yunfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 10:22:05 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 12:22:05 -0400 Subject: [maker-devel] ./FINISHED/FINISHED.gff In-Reply-To: Message-ID: Sometime the master_datastore_index.log gets munged by MPI (processes print at the same time). You can rebuild it by running a single instance of maker whith the -dsindex flag. It only takes about 60 seconds to rebuild. Example: cd maker -dsindex --Carson From: Yunfei Guo Date: Friday, 24 May, 2013 12:15 PM To: Subject: [maker-devel] ./FINISHED/FINISHED.gff Hi Carson, When I tried to merge all gff files, I got this error: ERROR: The file './FINISHED/FINISHED.gff' does not exist and I found something like below in master_datastore_index.log. Is this caused by the duplicate scaffold? C12919781 GapCloser-Nigro-Min1k_datastore/28/79/C12919781/ FINISHED FINISHED scaffold138015 GapCloser-Nigro-Min1k_datastore/F7/0C/scaffold138015/ FINISHED FASTA lines for C12919781 and scaffold138015 >C12919781 36.0 >C12919781 36.0 CGTAAATGCATCCGCGTATAAATGCGACAGTAAGAGTTAATGATGCAGTATAAAAAGCAAGAAAAAGCGTTTATGG TGGGAGGCGGAGGCATCCAACTAACACCAGACTGTTAACCCGGAGACCAGTGGTCGACACCGTCG(skip...) >scaffold138015 35.1 ATATGCATATGCATATGCATATGCATATGCATATGCATATATAGACATGTAGATATAGACATCAATCATACACGTA ACCCATCATTCGTATTATTAAATCACATTTTGTGACTTTGCCCATCTGTCTTTAAAGGGACAATGTGTATG(skip ...) maker 2.27 Thanks, Yunfei _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 24 14:06:51 2013 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 24 May 2013 16:06:51 -0400 Subject: [maker-devel] Using maker with precomputed transcript / protein alignments In-Reply-To: Message-ID: I'm glad it's working. I think I'll add a check for the '/' characters in the base name as I think having it be a directory will get me in trouble somewhere with hidden bugs. Thanks, Carson From: Daniel Standage Date: Friday, 24 May, 2013 4:00 PM To: Carson Holt Subject: Re: [maker-devel] Using maker with precomputed transcript / protein alignments Oh wow, you are going to LOVE this. I kept on messing around with things to see if I could tease out any patterns, and eventually it hit me. In my working directory, I have an outputs directory, which is intended to contain output directories from various different maker runs. However, since my submission scripts launch Maker from the working directory, I use -base outputs/blahblahblah as the base parameter. So when it tries to create output files using the base name (the SQLite3 db just happens to be the first), it tries to create outputs/blahblahblah/outputs/blahblahblah.db, and of course that internal outputs directory doesn't exist. Every time I've had problems, I've been using a basename with a / character (relative directory path). Every time I haven't had problems, it was because the / wasn't there. Since the base parameter determines the name of the output directory, I assumed I could also use it specify a nested output directory. So it looks like I just need to be more careful that the basenames I use don't contain / characters or any other special UNIX characters. Of course, this could be made explicit in the usage statement, or you could add something like this right after parsing the command line arguments. if($OPT{"out_name"} =~ m/\//) { printf(STDERR "base '%s' invalid: basenames containing relative directory paths cause errors; please provide a simple string instead", $OPT{"out_name"}); exit_maker(0); } Alternatively, you could handle things like I had originally expected: if I provide path/to/mybase as my base parameter, maker would create the path/to/mybase directory initially, but then in the creation of subsequent files it would simply use mybase. I don't imagine this would be too extensive of a change, but I understand Maker has a huge codebase. Anyway, just some suggestions, take them for what they're worth. Thanks for your help! -- Daniel S. Standage Ph.D. Candidate Bioinformatics and Computational Biology Program Department of Genetics, Development, and Cell Biology Iowa State University On Fri, May 24, 2013 at 3:29 PM, Carson Holt wrote: > NFS is weird. It's hard to say why it was freezing the first times, and did > not appear to freeze on your very last try. I definitely want to know if it > starts to freeze again, or if stack traces show a consistent point where it > freezes. If it keeps happening, I can try making the database in the local > /tmp and then just copying it to the current working directory once it's > populated to get around any weird NFS issues. But before going through all > the effort to do that, I'd like to know that it's not some other weird bug > related to the perl your using or other modules that are installed. Top > candidates on the list would be modules such as forks, forks::shared, DBI, or > DBD::SQLite. Try reinstalling those > > Thanks, > Carson > > > From: Daniel Standage > Date: Friday, 24 May, 2013 3:19 PM > > To: Carson Holt > Subject: Re: [maker-devel] Using maker with precomputed transcript / protein > alignments > > I admit I killed these last few runs too quickly, I guess I was getting > impatient, especially since waiting hours or days hasn't made a difference > before. Either way, that was sloppy on my part. > > However, I always specify the base parameter, whether or not I'm running > mulitple maker jobs from the same directory. And if I ever restarted a job, I > have always removed the original output directory entirely before > relaunching--precisely to avoid the types of mistakes you mention arising from > residual files. > > -- > Daniel S. Standage > Ph.D. Candidate > Bioinformatics and Computational Biology Program > Department of Genetics, Development, and Cell Biology > Iowa State University > > > On Fri, May 24, 2013 at 3:10 PM, Carson Holt wrote: >> Correct if you use the -base parameter you should get a different output >> directory. And if you have never used that base before, and it still >> freezes, then there is a problem. You do need to give it a little more time >> until killing it, as the stack trace in both cases showed that it was less >> than 25% finished reading the input GFF3 files and even less than that in the >> first case (so give it about 5x as long before giving up). >> >> It might just be that the NFS mount is slow. Or because of how weird the >> error is, other options include reinstalling perl and all modules. The >> weirdest bugs are often broken perl or inadvertently using modules from >> different perl versions via the PERL5LIB environmental variable (this is very >> common and can cause very wacky behavior). Another option is verifying all >> software for the lustre NFS mount is up to date. Lastly there was an odd NFS >> bug that came up on the e-mail list last week that was fixed by a kernel >> upgrade. >> >> --Carson >> >> >> >> From: Daniel Standage >> Date: Friday, 24 May, 2013 3:01 PM >> >> To: Carson Holt >> Subject: Re: [maker-devel] Using maker with precomputed transcript / protein >> alignments >> >> The file locks are created only in the output directory, no? So there is a >> problem if I have multiple maker runs launched from the same directory, but >> writing to different output directories (as specified by different base >> parameters)? >> >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Bioinformatics and Computational Biology Program >> Department of Genetics, Development, and Cell Biology >> Iowa State University >> >> >> On Fri, May 24, 2013 at 2:57 PM, Carson Holt wrote: >>> To clarify, that means you need to use a different working directory. Can >>> be a subdirectory of your original. >>> >>> --Carson >>> >>> >>> From: Carson Holt >>> Date: Friday, 24 May, 2013 2:56 PM >>> To: Daniel Standage >>> >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Both stack traces show different locations in the code and file being read. >>> So it appears it was not frozen, just interrupted by control-C. >>> >>> If you restart make sure you do so in a completely new directory from the >>> original run. This is because I wonder if there is a failed job that still >>> has active processes and is holding onto file locks in that directory. >>> >>> --Carson >>> >>> >>> From: Daniel Standage >>> Date: Friday, 24 May, 2013 2:50 PM >>> To: Carson Holt >>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>> protein alignments >>> >>> Deleted output directory and re-ran. Stack trace looks pretty similar. >>> >>> >>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line >>> 607. >>> SIGINT received >>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>> line 97, <$IN >>>> > line 243676. >>> forks::signals::__ANON__('INT') called at /usr/lib64/perl5/DBI.pm >>> line 1590 >>> eval {...} called at /usr/lib64/perl5/DBI.pm line 1590 >>> DBD::_::db::do('DBI::db=HASH(0x4987228)', 'INSERT INTO est_gff >>> (seqid, source, parent, start, end, line)...') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 493 >>> GFFDB::_add_to_db('GFFDB=HASH(0x49727a0)', >>> 'DBI::db=HASH(0x49871e0)', 'est_gff', 'HASH(0x49877e0)') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 432 >>> GFFDB::_add_type('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>> 'est_gff') called at >>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>> GFFDB::add_est('GFFDB=HASH(0x49727a0)', >>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>> GFFDB::new('GFFDB', 'HASH(0x489c488)') called at >>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>> >>> >>> -- >>> Daniel S. Standage >>> Ph.D. Candidate >>> Bioinformatics and Computational Biology Program >>> Department of Genetics, Development, and Cell Biology >>> Iowa State University >>> >>> >>> On Fri, May 24, 2013 at 2:45 PM, Carson Holt wrote: >>>> Could you run again, and so I can see if the stack trace is the same each >>>> time. >>>> >>>> --Carson >>>> >>>> >>>> From: Daniel Standage >>>> Date: Friday, 24 May, 2013 2:39 PM >>>> >>>> To: Carson Holt >>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>> protein alignments >>>> >>>> Restarted in the original NSF-mounted directory, never saw the .db file, >>>> and got this as the stack trace upon termination. >>>> >>>> STATUS: Setting up database for any GFF3 input... >>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>> line 607. >>>> SIGINT received >>>> at /N/u/dstandag/Mason/local/src/PerlLibs/lib64/perl5/forks/signals.pm >>>> line 97, <$IN> line 170294. >>>> forks::signals::__ANON__('INT') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> eval {...} called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 475 >>>> GFFDB::_parse_line('GFFDB=HASH(0x4e5c730)', 'SCALAR(0x4e714b8)', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 431 >>>> GFFDB::_add_type('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...', >>>> 'est_gff') called at >>>> /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 324 >>>> GFFDB::add_est('GFFDB=HASH(0x4e5c730)', >>>> '/N/dc/scratch/dstandag/PdomGenomic/Annotation/annot-v0.41/inp...') called >>>> at /N/hd01/dstandag/Mason/local/src/maker-dev/bin/../lib/GFFDB.pm line 57 >>>> GFFDB::new('GFFDB', 'HASH(0x4d86488)') called at >>>> /N/u/dstandag/Mason/local/src/maker-dev/bin/maker line 608 >>>> >>>> >>>> -- >>>> Daniel S. Standage >>>> Ph.D. Candidate >>>> Bioinformatics and Computational Biology Program >>>> Department of Genetics, Development, and Cell Biology >>>> Iowa State University >>>> >>>> >>>> On Fri, May 24, 2013 at 2:25 PM, Carson Holt wrote: >>>>> Start a new job in a new directory from the original job (NFS mount). Use >>>>> the new maker executable I sent. If it still freezes, hit control-C to >>>>> get a stack trace. >>>>> >>>>> --Carson >>>>> >>>>> >>>>> From: Daniel Standage >>>>> Date: Friday, 24 May, 2013 2:21 PM >>>>> >>>>> To: Carson Holt >>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>> protein alignments >>>>> >>>>> The job from several hours ago is still running with no changes. >>>>> >>>>> I just relaunched the job with a locally mounted working directory: I >>>>> could see the .db file almost immediately, and it took less than 5 minutes >>>>> to successfully build the SQLite3 db and proceed to the next steps of the >>>>> pipeline. Any ideas? >>>>> >>>>> -- >>>>> Daniel S. Standage >>>>> Ph.D. Candidate >>>>> Bioinformatics and Computational Biology Program >>>>> Department of Genetics, Development, and Cell Biology >>>>> Iowa State University >>>>> >>>>> >>>>> On Fri, May 24, 2013 at 2:01 PM, Carson Holt wrote: >>>>>> The NFS mount appears to be configured correctly. >>>>>> >>>>>> Here is what the maker.output directory should look like while the >>>>>> database is being generated. >>>>>> >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:51 . >>>>>> drwxr-xr-x 10 cholt staff 340 24 May 13:50 .. >>>>>> -rw------x 1 cholt staff 85 24 May 13:50 >>>>>> .NFSLock.gi_lock.NFSLock >>>>>> -rw------- 1 cholt staff 52 24 May 13:50 >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> -rw-r--r-- 1 cholt staff 1413 24 May 13:50 maker_bopts.log >>>>>> -rw-r--r-- 1 cholt staff 1666 24 May 13:50 maker_exe.log >>>>>> -rw-r--r-- 1 cholt staff 4610 24 May 13:50 maker_opts.log >>>>>> drwxr-xr-x 4 cholt staff 136 24 May 13:50 mpi_blastdb >>>>>> -rw-r--r-- 1 cholt staff 29326336 24 May 13:51 pdom-annot-v0.41-1.db >>>>>> -rw-r--r-- 1 cholt staff 6704 24 May 13:51 >>>>>> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> >>>>>> Could you watch while maker is running to see if this file is created --> >>>>>> .NFSLock.pdom-annot-v0.41-1.db.NFSLock >>>>>> You must use ls with the -a flag to see it or it will be hidden. >>>>>> >>>>>> Just keep letting it run until that file shows up. Shortly after it sows >>>>>> up, this one should appear --> pdom-annot-v0.41-1.db-journal >>>>>> >>>>>> Also could you try running MAKER once with the working directory being >>>>>> locally mounted (/tmp for example). >>>>>> >>>>>> --Carson >>>>>> >>>>>> >>>>>> >>>>>> From: Daniel Standage >>>>>> Date: Friday, 24 May, 2013 1:36 PM >>>>>> >>>>>> To: Carson Holt >>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>> protein alignments >>>>>> >>>>>> Here is the output. >>>>>> >>>>>> [dstandag at mason annot-v0.41] ls -al >>>>>> outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> total 32 >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 13:34 . >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 .. >>>>>> -rw-r--r-- 1 dstandag biol 1413 May 24 12:39 maker_bopts.log >>>>>> -rw-r--r-- 1 dstandag biol 1355 May 24 12:39 maker_exe.log >>>>>> -rw-r--r-- 1 dstandag biol 4883 May 24 12:39 maker_opts.log >>>>>> drwxr-xr-x 3 dstandag biol 4096 May 24 12:39 mpi_blastdb >>>>>> -rw------x 1 dstandag biol 70 May 24 13:34 .NFSLock.gi_lock.NFSLock >>>>>> [dstandag at mason annot-v0.41] df outputs/pdom-annot-v0.41-1.maker.output/ >>>>>> Filesystem 1K-blocks Used Available Use% Mounted on >>>>>> dc-mds01.uits.indiana.edu:/dc >>>>>> 1144318908992 928977247792 203869022296 83% /N/dc >>>>>> [dstandag at mason annot-v0.41] mount >>>>>> login_x86_64 on / type tmpfs (rw) >>>>>> proc on /proc type proc (rw) >>>>>> sysfs on /sys type sysfs (rw) >>>>>> devpts on /dev/pts type devpts (rw,gid=5,mode=620) >>>>>> tmpfs on /dev/shm type tmpfs (rw) >>>>>> tmpfs on /var/tmp type tmpfs (rw,size=10m) >>>>>> /dev/sdb2 on /tmp type ext4 (rw,relatime,barrier=1,data=ordered) >>>>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) >>>>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) >>>>>> AFS on /afs type afs (rw) >>>>>> bl-nas1:/vol/hd00 on /N/hd00 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas1:/vol/hd01 on /N/hd01 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/hd02 on /N/hd02 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas2:/vol/hd03 on /N/hd03 type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/hdln on /N/u type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> bl-nas2:/vol/soft on /N/soft type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.130) >>>>>> bl-nas1:/vol/logs on /N/logs type nfs >>>>>> (rw,nosuid,tcp,rsize=32768,wsize=32768,timeo=600,retrans=2,intr,addr=149. >>>>>> 165.226.129) >>>>>> none on /dev/cpuset type cpuset (rw) >>>>>> dc-mds01.uits.indiana.edu:/dc on /N/dc type lustre (rw,localflock) >>>>>> 149.165.235.173:/mds-wan/client on /N/dcwan type lustre (rw,localflock) >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Daniel S. Standage >>>>>> Ph.D. Candidate >>>>>> Bioinformatics and Computational Biology Program >>>>>> Department of Genetics, Development, and Cell Biology >>>>>> Iowa State University >>>>>> >>>>>> >>>>>> On Fri, May 24, 2013 at 1:29 PM, Carson Holt wrote: >>>>>>> They load fine for me. It is an SQLite database. I know that SQLlite >>>>>>> can freeze on NFS if it's not configured properly. >>>>>>> >>>>>>> Could you send me the output from these 3 commands. >>>>>>> >>>>>>> ls -al >>>>>>> df >>>>>>> mount >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 1:13 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I deleted the entire output directory before relaunching. No .db files >>>>>>> are even created, only the mpi_blastdb directory with the genomic >>>>>>> sequence data and corresponding index, before it hangs. >>>>>>> >>>>>>> The GFF3 files are attached. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 12:57 PM, Carson Holt >>>>>>> wrote: >>>>>>> Did you delete any *.db files in the maker.output directory first. If >>>>>>> not do that, and check on the rerun if that file is growing in size. It >>>>>>> is a database to hold the GFF3 file entries. It's final size should be >>>>>>> ~ 2x the size of the combined GFF3 files. If it is growing, then it is >>>>>>> not really frozen (you just need to give it more time). If it is not >>>>>>> growing, send me your GFF3 files and I can try and duplicate the error. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 12:50 PM >>>>>>> >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I installed BioPerl-1.6.901, rebuilt Maker, and re-launched the job. >>>>>>> After running for 10-15 minutes, it seems to be hanging in the same >>>>>>> place as before. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 11:38 AM, Carson Holt >>>>>>> wrote: >>>>>>> That is the CPAN version and the last stable release on bioperl.org >>>>>>> . Older version as well as the bio-perl live >>>>>>> version will cause MAKER to fail. The both have issues with the Fasta >>>>>>> indexing module that maker uses. >>>>>>> >>>>>>> http://search.cpan.org/CPAN/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.901.tar >>>>>>> .gz >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 11:34 AM >>>>>>> To: Carson Holt >>>>>>> Subject: Re: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> I'm not sure if a rebuild of Maker was necessary, but I tried running it >>>>>>> just to be safe. It's complaining about Bio::Root::Version dependency >>>>>>> not being met. Looking at the Build.PL file, it requires >>>>>>> Bio::Root::Version version 1.006901. Is there really such a version, or >>>>>>> should this be changed to 1.006 or 1.006001? >>>>>>> >>>>>>> For now I'll change it to 1.006001 (the installed version) and proceed >>>>>>> with another test. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> >>>>>>> >>>>>>> On Fri, May 24, 2013 at 9:45 AM, Carson Holt wrote: >>>>>>> Could you run this command in the maker devel base directory. >>>>>>> >>>>>>> svn switch --relocate svn://* >>>>>>> ************ >>>>>>> svn://* *************** >>>>>>> >>>>>>> Then do 'svn update', and then tell me what happens. Make sure to >>>>>>> delete the and *.db files in the *.maker.output/ directory before >>>>>>> retrying. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> >>>>>>> From: Daniel Standage >>>>>>> Date: Friday, 24 May, 2013 9:10 AM >>>>>>> To: Maker Mailing List >>>>>>> Subject: [maker-devel] Using maker with precomputed transcript / >>>>>>> protein alignments >>>>>>> >>>>>>> Greetings! >>>>>>> >>>>>>> I have some precomputed transcript and protein alignments that I would >>>>>>> like to use with Maker. I have converted them into GFF3 format (see >>>>>>> attached examples) and provided them to their corresponding entries >>>>>>> (est_gff, altest_gff, protein_gff) in the maker_opts.ctl file. >>>>>>> >>>>>>> Unfortunately, Maker seems to be getting caught up on processing these >>>>>>> GFF3 files. I've tried running Maker 2.10 as well as the development >>>>>>> version (checked out a few months ago--svn server isn't responding so I >>>>>>> can't give a precise revision number), and in both cases Maker hangs >>>>>>> while trying to create the GFF3 database. These are the last lines I see >>>>>>> in STDERR when --debug is set. >>>>>>> >>>>>>> STATUS: Setting up database for any GFF3 input... >>>>>>> Calling GFFDB::new at /N/u/dstandag/Mason/local/src/maker-dev/bin/maker >>>>>>> line 587. >>>>>>> >>>>>>> I can't find any documentation specifying any explicit requirements for >>>>>>> the alignment-containing GFF3 input files. Maker output uses the pretty >>>>>>> canonical expressed_sequence_match, protein_match, and match_part >>>>>>> features for encoding alignments, and I have used this convention with >>>>>>> my input (see attached examples). I have also double-checked that my >>>>>>> examples are valid GFF3, so my guess is that Maker has additional >>>>>>> constraints/expectations for certain fields in the GFF3 files (score >>>>>>> column? required attributes?). Is this correct, and if so would you be >>>>>>> able to point me toward any related documentation I may have missed? >>>>>>> >>>>>>> Many thanks. >>>>>>> >>>>>>> -- >>>>>>> Daniel S. Standage >>>>>>> Ph.D. Candidate >>>>>>> Bioinformatics and Computational Biology Program >>>>>>> Department of Genetics, Development, and Cell Biology >>>>>>> Iowa State University >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listin >>>>>>> fo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.syme at gmail.com Sun May 26 20:26:58 2013 From: rob.syme at gmail.com (Rob Syme) Date: Mon, 27 May 2013 10:26:58 +0800 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Message-ID: Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_datastore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 -------------- next part -------------- An HTML attachment was scrubbed... URL: From uma at ebi.ac.uk Tue May 28 06:00:54 2013 From: uma at ebi.ac.uk (Uma Maheswari) Date: Tue, 28 May 2013 13:00:54 +0100 Subject: [maker-devel] duplicate exons? In-Reply-To: <5195ED54.4090501@ebi.ac.uk> References: <5195ED54.4090501@ebi.ac.uk> Message-ID: <51A49C76.3060801@ebi.ac.uk> Thanks Carson, 2.28 with -a command line flag fixed this problem. Uma On 17/05/13 09:41, Uma Maheswari wrote: > Hi Carson, > > I checked with Michael, this is different from what he saw, he had > entire segements of gff files duplicated, In this case, just Parent id > is. > I am preparing the files you asked for, will send them soon > > thanks > Uma > > > On 16/05/13 17:50, Carson Holt wrote: >> Yes. Perhaps this is the same issue Michael saw, although the one >> difference I see from his post is the Parent= attribute. >> >> --> >> Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1 >> >> I have seen duplicate exons from GFF3 pass-through in the past, but >> if that's not being used I'd be very appreciative of any test dataset >> you could give me. >> >> Thanks, >> Carson >> >> >> >> >> From: Daniel Hughes > >> Date: Thursday, 16 May, 2013 12:38 PM >> To: Carson Holt > >> Cc: Uma Maheswari >, >> "maker-devel at yandell-lab.org " >> > >> Subject: Re: [maker-devel] duplicate exons? >> >> hiya, are you using the same instance as michael at ebi as this >> sounds like the same problem he had last week and he wasn't running >> pass through. i've run 2.27 here 30+ times here and not seen this? is >> something very strange corrupted? >> >> dan. >> >> Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge) >> ------------------------------------------------------------------------------------- >> dsth at cantab.net >> dsth at cpan.org >> >> >> 2013/5/16 Carson Holt > >> >> I think this also may be a result of using GFF3 pass-through. So >> if that >> is the case, could you send me any GFF3 files you gave maker in >> addition >> to the other files I asked for. >> >> Thanks, >> Carson >> >> >> >> On 13-05-16 12:08 PM, "Uma Maheswari" > > wrote: >> >> >Hi Carson, >> > >> >When I was trying to load the Maker-2.27 results into ensembl, I >> found >> >that few hundreds of genes with 'duplicate exons' . When I looked >> in the >> >gff file, I found cases like this, where the exons are not actually >> >duplicated but have two Parents with same mRNA ID. This can be a >> >potential alternate transcript, attached to the same transcript by >> >mistake? >> > >> >Many thanks >> >Uma >> > >> > >> > >> > >> > >> >3 maker gene 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed >> >-gene-6.179 >> >3 maker mRNA 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3- >> >processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A >> >ED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406 >> >3 maker exon 524271 524480 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker exon 524538 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.17 >> >9-mRNA-1 >> >3 maker exon 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_ >> >masked-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 524903 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524538 525182 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker CDS 524271 524480 . - 0 >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_maske >> >d-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524271 525467 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> >3 maker five_prime_UTR 524904 525182 . - . >> >ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=aug >> >ustus_masked-3-processed-gene-6.179-mRNA-1 >> > >> > >> >_______________________________________________ >> >maker-devel mailing list >> >maker-devel at box290.bluehost.com >> >> >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjin01 at mail.rockefeller.edu Tue May 28 19:37:58 2013 From: jjin01 at mail.rockefeller.edu (Jingjing Jin) Date: Wed, 29 May 2013 01:37:58 +0000 Subject: [maker-devel] maker running error Message-ID: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue May 28 19:58:51 2013 From: myandell at genetics.utah.edu (Mark Yandell) Date: Wed, 29 May 2013 01:58:51 +0000 Subject: [maker-devel] maker running error In-Reply-To: References: Message-ID: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: Dear all, When I try to run maker on my datasets, there is an error like this: #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastx: /usr/local/bin/blastall -p blastx -d /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta.repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner #-------------------------------# [blastall] FATAL ERROR: search cannot proceed due to errors in all contexts/frames of query sequences deleted:0 hits running blast search. #--------- command -------------# Widget::blastn: /usr/local/bin/blastall -p blastn -d /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapclose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blastn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn #-------------------------------# [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options ERROR: BLASTN failed FATAL ERROR ERROR: Failed while doing blastn of ESTs!! ERROR: Chunk failed at level 8 !! FAILED CONTIG:C4345703 Could anyone give me some suggestions? Thanks! Jingjing _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:45:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:45:30 -0400 Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? In-Reply-To: Message-ID: It's not an MPI requirement, just an execution error. I've attached a fixed version of that script. Really it is just a wrapper that runs maker with a few parameters changes. You can do the exact same thing by removing all repeat mask options, setting est2genome=1 and then an adding est_forward=1 to the maker_opts.ctl file. Thanks, Carson From: Rob Syme Date: Sunday, 26 May, 2013 10:26 PM To: Subject: [maker-devel] Can map2assembly be run outside the maker pipeline? Hi all I'm looking to move existing transcripts from one genome assembly to another, keeping the transcript names if possible. Running map2assembly seems to require MPI (stderr example below). Is is possible to run map2assembly outside of the Maker pipeline and without MPI? Stderr head: INFO: All repeat masking options will be skipped. A data structure will be created for you at: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_datastore To access files for individual sequences use the datastore index: /path/to/maker/bin/SN15v2_scaffolds.maker.output/SN15v2_scaffolds_master_dat astore_index.log Can't call method "get_Seq_by_id" on an undefined value at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 226, line 1. FATAL ERROR ERROR: Failed in tier preparation WARNING: You must always set a rank before running MpiTiers FATAL: argument `q_def` does not exist in MpiTier object at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 86, line 1. Process::MpiChunk::_initialize_vars('Process::MpiChunk=HASH(0x332dac8)', 'HASH(0x332db88)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 47 Process::MpiChunk::new('Process::MpiChunk', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 413 Process::MpiChunk::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f49498)', 'HASH(0x332d728)') called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 4165 Process::MpiChunk::_go('Process::MpiChunk=HASH(0x2f35e88)', 'load', 'HASH(0x2ef85a8)', 0, 0) called at /path/to/maker/bin/../lib/Process/MpiChunk.pm line 316 Process::MpiChunk::_loader('Process::MpiChunk=HASH(0x2f35e88)', 'HASH(0x2ef85a8)', 0, 0, 'Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 364 Process::MpiTiers::__ANON__() called at /path/to/maker/bin/../lib/Error.pm line 415 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 407 Error::subs::try('CODE(0x2f411a0)', 'HASH(0x2f491c8)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 375 Process::MpiTiers::_load_chunks('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 185 Process::MpiTiers::next_chunk('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 816 Process::MpiTiers::_handler('Process::MpiTiers=HASH(0x79f3d0)', 'Error::Simple=HASH(0x2f35c18)', 'Failed in tier preparation') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 78 Process::MpiTiers::__ANON__('Error::Simple=HASH(0x2f35c18)', 'SCALAR(0x1179c30)') called at /path/to/maker/bin/../lib/Error.pm line 339 eval {...} called at /path/to/maker/bin/../lib/Error.pm line 329 Error::subs::run_clauses('HASH(0x2f36230)', 'Can\'t call method "get_Seq_by_id" on an undefined value at /...', undef, 'ARRAY(0x117a1e8)') called at /path/to/maker/bin/../lib/Error.pm line 426 Error::subs::try('CODE(0x2f28898)', 'HASH(0x2f36230)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 79 Process::MpiTiers::_prepare('Process::MpiTiers=HASH(0x79f3d0)') called at /path/to/maker/bin/../lib/Process/MpiTiers.pm line 56 Process::MpiTiers::new('Process::MpiTiers', 'HASH(0x79f508)', 0, 'Process::MpiChunk') called at ./map2assembly line 205 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: map2assembly Type: application/octet-stream Size: 6413 bytes Desc: not available URL: From carsonhh at gmail.com Wed May 29 06:49:39 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:49:39 -0400 Subject: [maker-devel] maker running error In-Reply-To: <558EECF8-8B9C-4C5D-9968-439D421C315F@genetics.utah.edu> Message-ID: Yes, most likely an input fasta error. If that is not the case there are also some versions of BLAST that have version specific failures, and are fixed by upgrading blast. For example, I see you are using blastall which is from the older NCBI BLAST as apposed to the newer BLAST+. --Carson From: Mark Yandell Date: Tuesday, 28 May, 2013 9:58 PM To: Jingjing Jin Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] maker running error Hi Jingjing, looks like your fasta files have problems. Have you checked to see if they are formatted correctly? cheers, --mark On May 28, 2013, at 7:37 PM, Jingjing Jin wrote: > Dear all, > > When I try to run maker on my datasets, there is an error like this: > > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.5 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.5.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > #--------- command -------------# > Widget::blastx: > /usr/local/bin/blastall -p blastx -d > /tmp/maker_W3xpXQ/te_proteins%2Efasta.mpi.10.6 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-06 -z 300 -Y > 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.te_proteins%2Efasta. > repeatrunner.temp_dir/te_proteins%2Efasta.mpi.10.6.repeatrunner > #-------------------------------# > [blastall] FATAL ERROR: search cannot proceed due to errors in all > contexts/frames of query sequences > deleted:0 hits > running blast search. > > #--------- command -------------# > Widget::blastn: > /usr/local/bin/blastall -p blastn -d > /tmp/maker_W3xpXQ/all_ref%2Efasta.mpi.10.0 -i > /tmp/maker_W3xpXQ/rank0/C4345703.0 -b 100000 -v 100000 -e 1e-10 -E 3 -W 15 -r > 1 -q -3 -G 3 -z 1000 -Y 500000000 -a 1 -U -F T -I T -o > /data/project/oil_palm/evolution/TS1/maker/TS1_gapclose.maker.output/TS1_gapcl > ose_datastore/CE/E6/C4345703//theVoid.C4345703/C4345703.0.all_ref%2Efasta.blas > tn.temp_dir/all_ref%2Efasta.mpi.10.0.blastn > #-------------------------------# > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > [blastall] WARNING: C4345703: Could not calculate ungapped Karlin-Altschul > parameters due to an invalid query sequence or its translation. Please verify > the query sequence(s) and/or filtering options > ERROR: BLASTN failed > > FATAL ERROR > ERROR: Failed while doing blastn of ESTs!! > > ERROR: Chunk failed at level 8 > !! > FAILED CONTIG:C4345703 > > > Could anyone give me some suggestions? > > Thanks! > > Jingjing > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 29 06:54:30 2013 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 29 May 2013 08:54:30 -0400 Subject: [maker-devel] Maker consensus In-Reply-To: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: Yes. That's ok, but you would get better performance by installing MPI and using that. Alternatively just start maker several times in the same directory without splitting the input fasta. You can usually start about 10-15 concurrent maker processes safely, but would still get better performance with MPI. --Carson From: Diana LeDuc Reply-To: Diana LeDuc Date: Tuesday, 28 May, 2013 8:16 AM To: , Carson Holt Cc: Gabriel Renaud , Janet Kelso Subject: Re: [maker-devel] Maker consensus Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > > > Thanks, > > Carson > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de> > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, > Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < kelso at eva.mpg.de> > Subject: Re: [maker-devel] Maker consensus > > > > > > Hi Carson, > > > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > > > Thank you! > > > > Best, > > > > Diana > > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com> wrote: > > >> >> Ok. You just ran the evidence and didn't give a gene predictor. You need >> to provide an HMM file for SNAP a species for augustus, or for rough >> annotations you can set protein3genome=1 and est2genome=1. This will try and >> generate models direct from the alignments. >> >> >> >> If you provide a gene predictor, then MAKER can talk to it about the >> evidence alignments so it can make a best gene call for the region. Then >> there will be gene/mRNA/exon model in the GFF3 file and entires in the >> proteins.fasta and transcripts.fasta. If you need to train a predictor, you >> can train SNAP using the maker2zff script and the SNAP documentation or maker >> GMOD tutorial. If you want to train augustus Jason Stajich wrote an >> excellent explanation as well as tools in a previous list message. >> >> >> >> >> list msg - http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html >> >> Script is in this github repo - >> >> https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2a >> ugustus_gbk.pl >> >> >> >> Thanks, >> >> Carson >> >> >> >> >> >> >> >> From: Diana LeDuc < diana_leduc at eva.mpg.de> >> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >> Date: Friday, 10 May, 2013 1:41 PM >> To: < maker-devel at yandell-lab.org>, Carson Holt < carsonhh at gmail.com> >> Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de>, >> Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >> kelso at eva.mpg.de> >> Subject: Re: [maker-devel] Maker consensus >> >> >> >> >> >> Hi Carson, >> >> >> >> Thank you for the quick answer. >> >> I ran gff3_merge to merge all the gff files and this resulted in a gff file, >> which has these type of fields: >> >> scaffold32239 blastx protein_match 22905 34500 174 + . >> ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML >> 1-2039; >> scaffold32239 blastx match_part 22905 23045 174 + . >> ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG0000 >> 0000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT0000 >> 0000219|DSCAML1-2039 172 218;Gap=M47; >> >> In comparison to the dpp_contig test file, I am missing est2genome evidence, >> most probably because my est data set is pretty poor. I have blastx and >> protein2genome evidence though. >> >> >> >> My goal is to extract the genes that could be annotated on the scaffolds. In >> the gff files the hits overlap most of the times, I can visualize this >> properly in apollo: for example one scaffold hits DSCAML gene in both >> zebrafinch and chicken, but extracting the coordinates between which this >> scaffold fits this annotated gene is difficult from the gff. Manually >> curating the genes is also not an option, since I am trying to do this for a >> 1.7Gb genome. >> >> >> >> I hope this explains better what we are after. >> >> >> >> Thank you once again. >> >> >> >> Best regards, >> >> >> >> Diana >> On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com> wrote: >> >> >>> >>> I'm sorry I don?t' understand question 1. You are you missing resulting >>> fasta files, correct? Did your resulting GFF3 file have any features of >>> type "gene"? Did you run fasta_merge after running gff3_merge? >>> >>> >>> >>> Could you give me more details on what you are trying to do, so I can take >>> a stab at question 2 as well. >>> >>> >>> >>> Thanks, >>> >>> Carson >>> >>> >>> >>> >>> >>> >>> >>> From: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de> >>> Date: Friday, 10 May, 2013 10:44 AM >>> To: < maker-devel at yandell-lab.org> >>> Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de>, Janet Kelso < >>> kelso at eva.mpg.de>, Torsten Schoeneberg < >>> torsten.schoeneberg at medizin.uni-leipzig.de> >>> Subject: [maker-devel] Maker consensus >>> >>> >>> >>> >>> >>> >>> >>> Dear maker developers, >>> >>> >>> I am a phD student working on de novo assembly and annotation of a bird >>> genome. I used Maker as annotation pipeline, which ran very well, and I >>> obtained different annotations with evidence from Augustus gene predictor, >>> small EST dataset from my organism and protein sequences from chicken, >>> turkey and zebrafinch. I could combine the different gff files from >>> different scaffolds into one gff file with annotations for the entire >>> genome. >>> >>> >>> I now have two questions: >>> >>> >>> 1. What could be the reason that I haven't gotten the protein.fasta and >>> trancript.fasta files >>> >>> >>> 2. How can I obtain a consensus gene list of different evidences from maker? >>> What I would actually need is the scaffold, coordinates and annotation (gene >>> name) according to the 3 other bird species. >>> Thank you in advance. >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Diana Le Duc >>> >>> >>> >>> -- >>> >>> Max Planck Institute for Evolutionary Anthropology >>> Department of Evolutionary Genetics >>> Deutscher Platz 6 >>> D-04103 Leipzig >>> >>> Phone +49 (0)341-3550-554 >>> www.eva.mpg.de >>> >>> >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diana_leduc at eva.mpg.de Tue May 28 06:16:40 2013 From: diana_leduc at eva.mpg.de (Diana LeDuc) Date: Tue, 28 May 2013 14:16:40 +0200 (CEST) Subject: [maker-devel] Maker consensus In-Reply-To: References: <1607622610.225353.1368209794909.JavaMail.open-xchange@oxchange.eva.mpg.de> Message-ID: <1539398593.274033.1369743400254.JavaMail.open-xchange@oxchange.eva.mpg.de> Hi Carson, I have now restarted maker with specification of augustus path and species. I am trying to run it separately on each scaffold just to parallelise the process and speed it up. It happens that some of the scaffolds which run ok in the complete datatset now fail. Do you have any idea why this happens? Is it ok to have a separate directory for each of the scaffolds and run maker in each of them? Thank you for the help. Best regards, Diana On May 10, 2013 at 8:29 PM Carson Holt wrote: > You can use any species augustus already has. If it doesn't then you train > it yourself. The species folder is pointed to by the AUGUSTUS_CONFIG_PATH > environmental variable, and is usually ?/augusts/config/species > > Thanks, > Carson > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > Date: Friday, 10 May, 2013 2:16 PM > To: < maker-devel at yandell-lab.org >, > Carson Holt < carsonhh at gmail.com > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > >, Gabriel Renaud < > gabriel_renaud at eva.mpg.de >, Janet Kelso < > kelso at eva.mpg.de > > Subject: Re: [maker-devel] Maker consensus > > Hi Carson, > > In maker_exe.ctl I would have to provide the path to augustus. Augustus has a > training set for chicken that I would use. Is it possible to specify the > species i want to use, or the only way is training Augustus myself? > > Thank you! > > Best, > > Diana > On May 10, 2013 at 7:51 PM Carson Holt < carsonhh at gmail.com > > wrote: > > > > Ok. You just ran the evidence and didn't give a gene predictor. You > > > need to provide an HMM file for SNAP a species for augustus, or for > > > rough annotations you can set protein3genome=1 and est2genome=1. This > > > will try and generate models direct from the alignments. > > > > If you provide a gene predictor, then MAKER can talk to it about the > > evidence alignments so it can make a best gene call for the region. Then > > there will be gene/mRNA/exon model in the GFF3 file and entires in the > > proteins.fasta and transcripts.fasta. If you need to train a predictor, you > > can train SNAP using the maker2zff script and the SNAP documentation or > > maker GMOD tutorial. If you want to train augustus Jason Stajich wrote an > > excellent explanation as well as tools in a previous list message. > > > > list msg - > > http://brie4.cshl.edu/pipermail/gmod-help/2012-June/001724.html > > > > Script is in this github repo - > > > > https://github.com/hyphaltip/genome-scripts/blob/master/gene_prediction/zff2augustus_gbk.pl > > > > > > Thanks, > > Carson > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > Date: Friday, 10 May, 2013 1:41 PM > > To: < maker-devel at yandell-lab.org >, > > Carson Holt < carsonhh at gmail.com > > > Cc: Torsten Schoeneberg < torsten.schoeneberg at medizin.uni-leipzig.de > > >, Gabriel Renaud < > > gabriel_renaud at eva.mpg.de >, Janet Kelso > > < kelso at eva.mpg.de > > > Subject: Re: [maker-devel] Maker consensus > > > > Hi Carson, > > > > Thank you for the quick answer. > > I ran gff3_merge to merge all the gff files and this resulted in a gff > > file, which has these type of fields: > > scaffold32239 blastx protein_match 22905 34500 174 + . > > ID=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039; > > scaffold32239 blastx match_part 22905 23045 174 + . > > ID=scaffold32239:hsp:2806529;Parent=scaffold32239:hit:976144;Name=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039;Target=ENSTGUG00000000198|ENSTGUT00000000219|DSCAML1-2039 > > 172 218;Gap=M47; > > In comparison to the dpp_contig test file, I am missing est2genome > > evidence, most probably because my est data set is pretty poor. I have > > blastx and protein2genome evidence though. > > > > My goal is to extract the genes that could be annotated on the scaffolds. > > In the gff files the hits overlap most of the times, I can visualize this > > properly in apollo: for example one scaffold hits DSCAML gene in both > > zebrafinch and chicken, but extracting the coordinates between which this > > scaffold fits this annotated gene is difficult from the gff. Manually > > curating the genes is also not an option, since I am trying to do this for a > > 1.7Gb genome. > > > > I hope this explains better what we are after. > > > > Thank you once again. > > > > Best regards, > > > > Diana > > On May 10, 2013 at 6:13 PM Carson Holt < carsonhh at gmail.com > > > wrote: > > > > > > > I'm sorry I don?t' understand question 1. You are you missing > > > > > resulting fasta files, correct? Did your resulting GFF3 file have > > > > > any features of type "gene"? Did you run fasta_merge after running > > > > > gff3_merge? > > > > > > Could you give me more details on what you are trying to do, so I can > > > take a stab at question 2 as well. > > > > > > Thanks, > > > Carson > > > > > > > > > > > > From: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Reply-To: Diana LeDuc < diana_leduc at eva.mpg.de > > > > > > > Date: Friday, 10 May, 2013 10:44 AM > > > To: < maker-devel at yandell-lab.org > > > > > > > Cc: Gabriel Renaud < gabriel_renaud at eva.mpg.de > > > >, Janet Kelso < kelso at eva.mpg.de > > > >, Torsten Schoeneberg < > > > torsten.schoeneberg at medizin.uni-leipzig.de > > > > > > > Subject: [maker-devel] Maker consensus > > > > > > > > > Dear maker developers, > > > > > > I am a phD student working on de novo assembly and annotation of a bird > > > genome. I used Maker as annotation pipeline, which ran very well, and I > > > obtained different annotations with evidence from Augustus gene predictor, > > > small EST dataset from my organism and protein sequences from chicken, > > > turkey and zebrafinch. I could combine the different gff files from > > > different scaffolds into one gff file with annotations for the entire > > > genome. > > > > > > I now have two questions: > > > > > > 1. What could be the reason that I haven't gotten the protein.fasta and > > > trancript.fasta files > > > > > > 2. How can I obtain a consensus gene list of different evidences from > > > maker? What I would actually need is the scaffold, coordinates and > > > annotation (gene name) according to the 3 other bird species. > > > > > > Thank you in advance. > > > > > > Best regards, > > > > > > Diana Le Duc > > > > > > -- > > > > > > Max Planck Institute for Evolutionary Anthropology > > > Department of Evolutionary Genetics > > > Deutscher Platz 6 > > > D-04103 Leipzig > > > > > > Phone +49 (0)341-3550-554 > > > www.eva.mpg.de > > > _______________________________________________ maker-devel mailing > > > list maker-devel at box290.bluehost.com > > > > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaganjot.kaur at sickkids.ca Wed May 29 13:34:19 2013 From: gaganjot.kaur at sickkids.ca (Gaganjot Kaur) Date: Wed, 29 May 2013 19:34:19 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Message-ID: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Carson.Holt at oicr.on.ca Wed May 29 21:10:52 2013 From: Carson.Holt at oicr.on.ca (Carson Holt) Date: Thu, 30 May 2013 03:10:52 +0000 Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs In-Reply-To: <5A46EF8CDF7C4F46AED4F14FC3AE17645F2B65@SKMBXX01.sickkids.ca> Message-ID: This is a parsing error coming from BioPerl. Could you run maker with the --debug flag. Redirect the STDERR to a file. You can kill it after a few seconds, I really just want to see the version for your BioPerl installation. Also what version of BLAST are you running. --Carson From: Gaganjot Kaur > Date: Wednesday, 29 May, 2013 3:34 PM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Maker error: failed while doing tblastx of alt-ESTs Hi Maker community, I have been trying to annotate a fungal genome using maker. As I do not have ests from the same species I have been using proteins and ests from two related species. Maker finishes successfully for all the scaffolds except two. These two scaffolds are around 2 mega bases each. I am running maker-2.27, using mpiexec to run over multiple compute nodes . Please see the error log below. Error from first scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:1562 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: main::node_thread /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:1381 STACK: threads::new /home/gkaur/perl5/lib/perl5/x86_64-linux-thread-multi/forks.pm:799 STACK: /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:864 ----------------------------------------------------------- --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 --> rank=1, hostname=cn-r56 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52379 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52379 Error from second scaffold: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Sequence with id BL_ORD_ID:3014 no longer exists in database...alignment skipped STACK: Error::throw STACK: Bio::Root::Root::throw /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/Root/Root.pm:472 STACK: Bio::SearchIO::blast::next_result /home/gkaur/tools/CentOS6/perl/5.14.2-usethreads/lib/site_perl/5.14.2/Bio/SearchIO/blast.pm:1888 STACK: Widget::tblastx::keepers /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:114 STACK: Widget::tblastx::parse /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Widget/tblastx.pm:95 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2676 STACK: GI::tblastx_as_chunks /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/GI.pm:2685 STACK: Process::MpiChunk::_go /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:1858 STACK: Process::MpiChunk::run /home/gkaur/softwares/maker/maker-2.27_with_new_openmpi/bin/../lib/Process/MpiChunk.pm:335 STACK: /home/softwares/maker/maker-2.27_with_new_openmpi/bin/maker:926 ----------------------------------------------------------- --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 --> rank=1, hostname=cn-r12 ERROR: Failed while doing tblastx of alt-ESTs ERROR: Chunk failed at level:4, tier_type:2 FAILED CONTIG:scaffold_52359 ERROR: Chunk failed at level:5, tier_type:0 FAILED CONTIG:scaffold_52359 The errors seem to come from alt-est that I have been using. I have tried running maker more than once over these two scaffolds and the same error appears each time. I have no idea what is going wrong here. Your help in understanding and resolving the error will be greatly appreciated. Thanks in advance, Gagan - - - - - - - - - - - - - - - - - Gaganjot Kaur Bioinformatics Analyst The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: