From carsonhh at gmail.com Thu Sep 1 10:57:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 09:57:58 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c83eeca5e7100000f39653@polymail.io> References: <57c83eeca5e7100000f39653@polymail.io> Message-ID: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson > On Sep 1, 2016, at 8:47 AM, Mark Ebbert wrote: > > > Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? > > I restarted the job and got the following segfault: > > ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. > Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. > mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: > mpd_uncaught_except_tb handling: > : list index out of range > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num > if list.index(list[i]) == i: > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo > info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run > self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 > mpd.run() > /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? > > Any ideas? > > Mark T. W. Ebbert > > On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: > Run 'maker -help? with mpiexec. > > Example: > mpiexec -n 10 maker -help > > If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. > > If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. > > ?Carson > > >> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >> >> >> Good day everyone! >> >> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >> >> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >> >> Thanks! >> >> Mark T. W. Ebbert >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 1 11:03:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 10:03:21 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c8503d99faf40000c7acab@polymail.io> References: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> <57c8503d99faf40000c7acab@polymail.io> Message-ID: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson > On Sep 1, 2016, at 10:00 AM, Mark Ebbert wrote: > > > Bummer. It worked at 720 the first time. Thanks again! > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: > -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. > > ?Carson > > > >> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >> >> >> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >> >> I restarted the job and got the following segfault: >> >> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >> mpd_uncaught_except_tb handling: >> : list index out of range >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >> if list.index(list[i]) == i: >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >> mpd.run() >> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >> >> Any ideas? >> >> Mark T. W. Ebbert >> >> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >> Run 'maker -help? with mpiexec. >> >> Example: >> mpiexec -n 10 maker -help >> >> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >> >> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>> >>> >>> Good day everyone! >>> >>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>> >>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>> >>> Thanks! >>> >>> Mark T. W. Ebbert >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eaalvarado at cpp.edu Thu Sep 1 14:44:18 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Thu, 1 Sep 2016 19:44:18 +0000 Subject: [maker-devel] MAKER Exonerate Error Message-ID: Hello, I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: Maker command used: mpiexec -mca btl ^openib -n 16 maker ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly ** ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate ** Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. Regards, Emilio A. Ortiz [linkedinbutton] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 659 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.error.log Type: application/octet-stream Size: 7891 bytes Desc: MAKER.error.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4722 bytes Desc: maker_opts.ctl URL: From carsonhh at gmail.com Fri Sep 2 16:08:50 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:08:50 -0600 Subject: [maker-devel] MAKER Exonerate Error In-Reply-To: References: Message-ID: <38035C41-1DE1-4512-B92F-AC60C182BBE8@gmail.com> This is coming from exonerate. You may need to reinstall it from source rtather than using the precompiled binaries. ?Carson > On Sep 1, 2016, at 1:44 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: > > Maker command used: mpiexec -mca btl ^openib -n 16 maker > > ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly > ** > ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) > sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 > ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly > .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate > ** > > Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. > > > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 2 16:11:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:11:19 -0600 Subject: [maker-devel] maker2.31.8 _ failure when processing repeats In-Reply-To: <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> References: <76B77664-2EA9-45AA-A1C1-1B5124DC0025@fas.harvard.edu> <01FEE9E8-69C7-4E42-9E8C-07E029BB01A5@gmail.com> <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> Message-ID: <33D1488A-E060-4760-AA99-6AB9B71EFADE@gmail.com> It will use both. It shouldn?t hurt setting both. It has more to do with expected attributes in column 8 (rm_gff i more forgiving). ?Carson > On Aug 30, 2016, at 2:31 PM, Lassance, Jean-Marc wrote: > > Let me clarify one thing: the first pass was performed with Maker running repeatMasker internally, which is why I decided to use them in the second pass, as well as the data from the independent run of RepeatMasker. > > From reading earlier posts, I gathered that Maker would use first the evidence from the rm_gff, and then from maker_gff if rm_pass=1 is activated, but that having both would not hurt. Correct? > > > JM > > > >> On Aug 30, 2016, at 4:12 PM, Carson Holt > wrote: >> >> Also make sure you pass the data in using rm_gff and not maker_gff if the repeats were not MAKER generated. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:16 AM, Daniel Ence > wrote: >>> >>> Hi Jean-Marc, so the first question I have is whether maker is still annotating repeats, even though you?re providing the rm_gff file. Are you providing a file or parameter for repeat masker in the maker_opts.ctl file? >>> >>> And secondly, what about the scaffold that is failing? How long is it, what is the percent N?s in the sequence there, and how much of it was masked in the rm_gff file? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Aug 30, 2016, at 7:19 AM, Lassance, Jean-Marc > wrote: >>>> >>>> Hi. >>>> >>>> I am using Maker2.31.8 to annotate a mammalian genome (with OpenMPI, Linux server). >>>> >>>> Basically, after running Maker a first time to generate a training set for SNAP, I am running it a second time with SNAP and Augustus enabled. Because we ran RepeatMasker independently, I am providing the gff3 like so: >>>> >>>> rm_gff=myanimal.repeatmasker.out.gff3 >>>> >>>> #-----Re-annotation Using MAKER Derived GFF3 >>>> maker_gff=myanimal.all.maker.pass1.gff >>>> rm_pass=1 >>>> >>>> Things seem to progress nicely (the vast majority of the scaffolds ?finish?), but one of the scaffolds keeps failing (I have attempted to restart after erasing the entire content of the output folder). This is the message that I could associated with this error: >>>> >>>> Died at /n/sw/fasrcsw/apps/MPI/gcc/4.8.2-fasrc01/openmpi/1.10.0-fasrc01/maker/2.31.8-fasrc01/bin/../perl/lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=26, hostname=holy2a11102.rc.fas.harvard.edu >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold00013 >>>> >>>> I wonder if you have an idea of what could be wrong here. >>>> >>>> Thanks for your help, >>>> >>>> >>>> Jean-Marc >>>> >>>> ?????????????????? >>>> Jean-Marc Lassance, PhD >>>> >>>> Harvard University >>>> Department of Organismic and Evolutionary Biology >>>> Department of Molecular and Cellular Biology >>>> Museum of Comparative Zoology >>>> >>>> 26, Oxford Street >>>> Cambridge MA 02138 >>>> USA >>>> >>>> email: lassance at fas.harvard.edu >>>> twitter: @lassancejm >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > ?????????????????? > Jean-Marc Lassance, PhD > > Harvard University > Department of Organismic and Evolutionary Biology > Department of Molecular and Cellular Biology > Museum of Comparative Zoology > > 26, Oxford Street > Cambridge MA 02138 > USA > > email: lassance at fas.harvard.edu > twitter: @lassancejm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Tue Sep 6 14:37:51 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 6 Sep 2016 15:37:51 -0400 Subject: [maker-devel] antisense RNA in training set? Message-ID: I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. Should I include these in the training set? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 6 15:09:11 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 6 Sep 2016 14:09:11 -0600 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: References: Message-ID: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> MAKER does not require input evidence to be on the correct strand because it performs splice aware alignments via Exonerate against both strands (reverse transcription for the second alignment happens internally). Exonerate should always map spliced alignments to the right strand because it is not be possible to get correct splicing on the opposite strand (splice sites are a stranded feature). The only alignments that are ambiguous are single exon alignments. They are ignored by default, but when not ignored they are stranded to the sequence with the longest canonical ORF. ?Carson > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. > > I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. > > Should I include these in the training set? > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eaalvarado at cpp.edu Tue Sep 6 16:00:59 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Tue, 6 Sep 2016 21:00:59 +0000 Subject: [maker-devel] MAKER mpi install Error Message-ID: Hello, I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: [stilllab at lettucelab src]$ ./Build clean Cleaning up build files [stilllab at lettucelab src]$ ./Build install Configuring MAKER with MPI support Had problems bootstrapping Inline module 'Parallel::Application::MPI' Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. at /usr/local/share/perl5/Inline.pm line 533. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 Do you know a workaround this problem? Thank you for your help. Regards, Emilio A. Ortiz -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 09:39:11 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 10:39:11 -0400 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> References: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> Message-ID: My organism's genome is predicted to have extremely few introns. Does that mean I should change the default alignment behavior for single exons? On Tue, Sep 6, 2016 at 4:09 PM, Carson Holt wrote: > MAKER does not require input evidence to be on the correct strand because > it performs splice aware alignments via Exonerate against both strands > (reverse transcription for the second alignment happens internally). > Exonerate should always map spliced alignments to the right strand because > it is not be possible to get correct splicing on the opposite strand > (splice sites are a stranded feature). The only alignments that are > ambiguous are single exon alignments. They are ignored by default, but when > not ignored they are stranded to the sequence with the longest canonical > ORF. > > ?Carson > > > > > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > > > I have a set of assembled transcripts from a stranded RNA seq run that I > want to use for gene finder training in a MAKER run on 'new' organism. > > > > I've noticed though that some of my assembled transcripts actually > appear to be antisense RNAs. > > > > Should I include these in the training set? > > > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 16:04:56 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 17:04:56 -0400 Subject: [maker-devel] General question about RNA evidence Message-ID: The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 17:19:43 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:19:43 -0600 Subject: [maker-devel] MAKER mpi install Error In-Reply-To: References: Message-ID: The error is with your OpenMPI install. It says that GLIBC does not match for /home/stilllab/.linuxbrew/lib/libmpi.so.12 You may need to reinstall. Perhaps manually. If you are using a homebrew package manager, there may be version mismatches with your system. ?Carson > On Sep 6, 2016, at 3:00 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: > > [stilllab at lettucelab src]$ ./Build clean > Cleaning up build files > [stilllab at lettucelab src]$ ./Build install > Configuring MAKER with MPI support > Had problems bootstrapping Inline module 'Parallel::Application::MPI' > > Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. > at /usr/local/share/perl5/Inline.pm line 533. > > > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. > Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 > > > Do you know a workaround this problem? Thank you for your help. > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 17:31:18 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:31:18 -0600 Subject: [maker-devel] General question about RNA evidence In-Reply-To: References: Message-ID: You need to assemble the reads using something like Trinity. The assembled results can be aligned to the proper strand with much greater specificity using splice aware alignments. Use the jaccard index options when running Trinity. ?Carson > On Sep 7, 2016, at 3:04 PM, Steven Sullivan wrote: > > The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. > > So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From me.mark at gmail.com Wed Sep 14 13:11:46 2016 From: me.mark at gmail.com (Mark Ebbert) Date: Wed, 14 Sep 2016 11:11:46 -0700 Subject: [maker-devel] (no subject) In-Reply-To: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> Message-ID: <57d6c562d78dd70000998701@polymail.io> Hey Carson! I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at? http://weatherby.genetics.utah.edu/MAKER/wiki/index.php ?to review detailed installation instructions (I think it?s there),?but the website is down. I appreciate your help. ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed PMPI_Comm_rank(68).: Invalid communicator =================================================================================== = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = ? PID 2462 RUNNING AT m7int02 = ? EXIT CODE: 1 = ? CLEANING UP REMAINING PROCESSES = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================" Mark T. W. Ebbert On Thu, Sep 01, 2016 at 10:03 AM Carson Holt < mailto:Carson Holt > wrote: a, pre, code, a:link, body { word-wrap: break-word !important; } MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson On Sep 1, 2016, at 10:00 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Bummer. It worked at 720 the first time. Thanks again! Mark T. W. Ebbert On Thu, Sep 01, 2016 at 9:57 AM Carson Holt ? <> wrote: -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson On Sep 1, 2016, at 8:47 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? I restarted the job and got the following segfault: ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: mpd_uncaught_except_tb handling: : list index out of range /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?264 ?pin_Uni_num if list.index(list[i]) == i: /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1449 ?pin_Cpuinfo info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1658 ?run self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?3676 ? mpd.run() /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault ? ? ?mpiexec -n 1000 maker? Any ideas? Mark T. W. Ebbert On Tue, Aug 30, 2016 at 10:54 AM Carson Holt ? <> wrote: Run 'maker -help? with mpiexec. Example: mpiexec -n 10 maker -help If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. ?Carson On Aug 30, 2016, at 10:10 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Good day everyone! I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days.? There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? Thanks! Mark T. W. Ebbert _______________________________________________ maker-devel mailing list mailto:maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 14 13:15:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Sep 2016 12:15:19 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57d6c562d78dd70000998701@polymail.io> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> <57d6c562d78dd70000998701@polymail.io> Message-ID: <0C568590-0A33-46DD-95FD-271D9A8E0009@gmail.com> Unset LD_PRELOAD. It really is only an OpenMPI issues, and may affect MPICH2 in a bad way. Also do './Build realclean? (a bit more thorough) in the source directory, then remove the ?/maker/perl directory before reinstalling. That will force reinstall of all missing perl dependancies and the perl/MPI bindings. ?Carson > On Sep 14, 2016, at 12:11 PM, Mark Ebbert wrote: > > > Hey Carson! > > I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php to review detailed installation instructions (I think it?s there), but the website is down. > > I appreciate your help. > > > ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed > PMPI_Comm_rank(68).: Invalid communicator > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 2462 RUNNING AT m7int02 > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================" > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 10:03 AM Carson Holt <> wrote: > MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. > > ?Carson > > > >> On Sep 1, 2016, at 10:00 AM, Mark Ebbert > wrote: >> >> >> Bummer. It worked at 720 the first time. Thanks again! >> >> Mark T. W. Ebbert >> >> On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: >> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. >> >> ?Carson >> >> >> >>> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >>> >>> >>> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >>> >>> I restarted the job and got the following segfault: >>> >>> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >>> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >>> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >>> mpd_uncaught_except_tb handling: >>> : list index out of range >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >>> if list.index(list[i]) == i: >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >>> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >>> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >>> mpd.run() >>> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >>> >>> Any ideas? >>> >>> Mark T. W. Ebbert >>> >>> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >>> Run 'maker -help? with mpiexec. >>> >>> Example: >>> mpiexec -n 10 maker -help >>> >>> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >>> >>> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >>> >>> ?Carson >>> >>> >>>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>>> >>>> >>>> Good day everyone! >>>> >>>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>>> >>>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>>> >>>> Thanks! >>>> >>>> Mark T. W. Ebbert >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 03:30:06 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Mon, 19 Sep 2016 18:30:06 +1000 Subject: [maker-devel] questions about post-processing of annotations Message-ID: Hi Carson, I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. Because there are no examples showing what I should expect I'm a bit lost. These are my files prior to the functional annotation. FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > And this, an example of the command I'm using maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta Thank you in advance. Xabi PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 17:07:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:07:59 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. Thanks, Carson > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 17:27:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:27:03 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). According to the tutorial the steps are: maker_map_ids map_gff_ids map_fasta_ids (for maker protein and transcripts) map_data_ids (for blast and iprs output) and then the maker_functional_* steps. So, the rename steps are before the maker_functional. Are you saying it should be the other way around? On 20 September 2016 at 08:07, Carson Holt wrote: > maker_functional_fasta reads the results of a blast report. It must be a > tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the > database and the maker fasta file as the query. If you renamed the > transcripts in the fasta before running maker_functional_fasta, the > results in the blast report will no longer match (because they have new > names). Use the map_data_ids script to fix names in the blast report if you > did that. > > Thanks, > Carson > > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial > (GMOD2014) but I think something is not right with the functional > annotation as no new information is added to the *.putative_function.* > files when I run the maker_functional_gff or the maker_functional_fasta. > All the fasta headings remain unchanged and the gff files don't show any > change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins. > renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default > it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 17:43:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:43:19 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. ?Carson > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). > According to the tutorial the steps are: > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt > wrote: > maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. > > Thanks, > Carson > >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 17:46:24 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:46:24 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. By the way, can you confirm If I need to blast the maker.transcripts.fasta? Thanks a lot On 20 September 2016 at 08:43, Carson Holt wrote: > You just have to make sure you ran the blast job report after renaming. If > you ran it before then the names in the report will not match the renamed > fasta. The blast job should be blastp (protein to protein). You can check > by just looking at the report. > > ?Carson > > > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. > I used the maker protein fasta file as query (should I do the same with the > transcripts?). > According to the tutorial the steps are: > > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt wrote: > >> maker_functional_fasta reads the results of a blast report. It must be a >> tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the >> database and the maker fasta file as the query. If you renamed the >> transcripts in the fasta before running maker_functional_fasta, the >> results in the blast report will no longer match (because they have new >> names). Use the map_data_ids script to fix names in the blast report if you >> did that. >> >> Thanks, >> Carson >> >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos >> wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial >> (GMOD2014) but I think something is not right with the functional >> annotation as no new information is added to the *.putative_function.* >> files when I run the maker_functional_gff or the maker_functional_fasta. >> All the fasta headings remain unchanged and the gff files don't show any >> change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by >> default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 17:50:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:50:40 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: No it is the protein to protein blast you need (which is where cross species homology occurs). Since the proteins come from the transcripts, doing a transcript translation blast (blastx in this case) would be redundant as well as less accurate because of how artifacts of a six frame translation reduce significance of the alignment. ?Carson > On Sep 19, 2016, at 4:46 PM, Xabier V?zquez Campos wrote: > > I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. > > By the way, can you confirm If I need to blast the maker.transcripts.fasta? > > Thanks a lot > > On 20 September 2016 at 08:43, Carson Holt > wrote: > You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. > > ?Carson > > >> On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: >> >> Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). >> According to the tutorial the steps are: >> maker_map_ids >> map_gff_ids >> map_fasta_ids (for maker protein and transcripts) >> map_data_ids (for blast and iprs output) >> and then the maker_functional_* steps. >> >> So, the rename steps are before the maker_functional. Are you saying it should be the other way around? >> >> >> >> >> >> On 20 September 2016 at 08:07, Carson Holt > wrote: >> maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. >> >> Thanks, >> Carson >> >>> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >>> >>> Hi Carson, >>> >>> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >>> >>> Because there are no examples showing what I should expect I'm a bit lost. >>> >>> These are my files prior to the functional annotation. >>> >>> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >>> And this, an example of the command I'm using >>> >>> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >>> >>> Thank you in advance. >>> >>> Xabi >>> >>> >>> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Mon Sep 19 23:21:18 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 00:21:18 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders Message-ID: I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? Or am I misunderstanding the workflow? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 23:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 22:34:31 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: Message-ID: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. ?Carson > On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: > > I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline > > As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. > > The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. > > But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. > > Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? > > Or am I misunderstanding the workflow? > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Sep 19 23:45:02 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 Sep 2016 04:45:02 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> Message-ID: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: > > The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. > > What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. > > After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. > > ?Carson > > > >> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >> >> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >> >> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >> >> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >> >> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >> >> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >> >> Or am I misunderstanding the workflow? >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Sep 20 14:17:24 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 Sep 2016 19:17:24 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: I can add that BUSCO did work well as a first-pass bootstrap (with the added convenience of running Augustus for generating an initial model). chris > On Sep 19, 2016, at 11:45 PM, Daniel Ence wrote: > > Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: >> >> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. >> >> What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. >> >> After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. >> >> ?Carson >> >> >> >>> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >>> >>> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >>> >>> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >>> >>> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >>> >>> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >>> >>> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >>> >>> Or am I misunderstanding the workflow? >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sullis02 at nyu.edu Tue Sep 20 14:28:20 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 15:28:20 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 20 15:15:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Sep 2016 14:15:21 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: You would need to create a custom database without the sequences you wish to exclude. ?Carson > On Sep 20, 2016, at 1:28 PM, Steven Sullivan wrote: > > Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? > > For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. > > However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? > > > > > From psh65 at cornell.edu Tue Sep 20 15:33:42 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Tue, 20 Sep 2016 20:33:42 +0000 Subject: [maker-devel] mapping cDNA to updated genome In-Reply-To: <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> References: <646B795A-1B04-4300-94C7-BEBEF0B37323@gmail.com> <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> Message-ID: <55D0187E-8C48-40DA-91BE-6370D46D041F@cornell.edu> Hi Mike and Carson, Thank you for your help. I used masked genome for aligning cDNAs. And yes, this was due to multiple aligning cDNA?s. I guess you could also filter according genes based on the alignment score from gff. I used GMAP (http://research-pub.gene.com/gmap/) to align cDNA on to the updated genome. GMAP has parameters to filter based on alignment scores and also can choose best path per cDNA. Regards, Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Aug 31, 2016, at 12:12 PM, Carson Holt > wrote: Also if you have multiple alignments of the same cDNA, you can use the score column of the mRNA feature to see which aligns best. If they have the same score, you will have to disambiguate manually or just remove all copies. ?Carson On Aug 31, 2016, at 10:10 AM, Michael Campbell > wrote: Hi Prashant, I?m almost positive that the additional genes are coming from multiply aligning cDNAs. Did you repeat mask your genome before mapping things forward? Another thought, what kind of whole genome duplications has your plant been through. it may be that the multiple alignments are to pseudogenes is some stage of decay. If that is the case it would probably be safe to keep the the gene from longest/best aligned cDNA. Thanks, Mike On Aug 31, 2016, at 10:35 AM, Prashant S Hosmani > wrote: Hi All, I am working on updating a plant genome annotation. I would like to map genes from previous annotation to a new genome build. There is a protocol about this in Campbell et al 2014, current protocols in bioinformatics (basic protocol 4 - Mapping annotations to a new assembly). I followed that protocol exactly with setting est_forward=1. But in output I?m getting large number of genes. My input cDNA fasta contains ~35K genes and after mapping there are ~58K genes. I?m using maker version 3.0. There are few changes in the genome and I?m not expecting many changes in the mapping previous genes. Please let me know if there are any other parameters to control mapping of EST?s. I was hoping to get similar number of genes mapped on to new assembly with very few changes. Thank you for your help in advance. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Thu Sep 22 11:27:53 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Thu, 22 Sep 2016 12:27:53 -0400 Subject: [maker-devel] should EST evidence be cleaned, assembled? Message-ID: Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 26 10:28:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Sep 2016 09:28:41 -0600 Subject: [maker-devel] should EST evidence be cleaned, assembled? In-Reply-To: References: Message-ID: You will want to trim the vector or any sequence not representative of the transcript or else it will not align well. The sequences will be aligned directly against the assembly. ?Carson > On Sep 22, 2016, at 10:27 AM, Steven Sullivan wrote: > > Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? > > > -- > Dr. Steven Sullivan > Center for Genomics & Systems Biology > New York University > 12 Waverly Place > New York, NY 10003 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Sep 1 09:57:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 09:57:58 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c83eeca5e7100000f39653@polymail.io> References: <57c83eeca5e7100000f39653@polymail.io> Message-ID: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson > On Sep 1, 2016, at 8:47 AM, Mark Ebbert wrote: > > > Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? > > I restarted the job and got the following segfault: > > ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. > Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. > mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: > mpd_uncaught_except_tb handling: > : list index out of range > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num > if list.index(list[i]) == i: > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo > info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run > self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 > mpd.run() > /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? > > Any ideas? > > Mark T. W. Ebbert > > On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: > Run 'maker -help? with mpiexec. > > Example: > mpiexec -n 10 maker -help > > If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. > > If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. > > ?Carson > > >> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >> >> >> Good day everyone! >> >> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >> >> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >> >> Thanks! >> >> Mark T. W. Ebbert >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 1 10:03:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 10:03:21 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c8503d99faf40000c7acab@polymail.io> References: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> <57c8503d99faf40000c7acab@polymail.io> Message-ID: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson > On Sep 1, 2016, at 10:00 AM, Mark Ebbert wrote: > > > Bummer. It worked at 720 the first time. Thanks again! > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: > -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. > > ?Carson > > > >> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >> >> >> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >> >> I restarted the job and got the following segfault: >> >> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >> mpd_uncaught_except_tb handling: >> : list index out of range >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >> if list.index(list[i]) == i: >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >> mpd.run() >> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >> >> Any ideas? >> >> Mark T. W. Ebbert >> >> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >> Run 'maker -help? with mpiexec. >> >> Example: >> mpiexec -n 10 maker -help >> >> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >> >> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>> >>> >>> Good day everyone! >>> >>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>> >>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>> >>> Thanks! >>> >>> Mark T. W. Ebbert >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eaalvarado at cpp.edu Thu Sep 1 13:44:18 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Thu, 1 Sep 2016 19:44:18 +0000 Subject: [maker-devel] MAKER Exonerate Error Message-ID: Hello, I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: Maker command used: mpiexec -mca btl ^openib -n 16 maker ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly ** ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate ** Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. Regards, Emilio A. Ortiz [linkedinbutton] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 659 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.error.log Type: application/octet-stream Size: 7891 bytes Desc: MAKER.error.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4722 bytes Desc: maker_opts.ctl URL: From carsonhh at gmail.com Fri Sep 2 15:08:50 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:08:50 -0600 Subject: [maker-devel] MAKER Exonerate Error In-Reply-To: References: Message-ID: <38035C41-1DE1-4512-B92F-AC60C182BBE8@gmail.com> This is coming from exonerate. You may need to reinstall it from source rtather than using the precompiled binaries. ?Carson > On Sep 1, 2016, at 1:44 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: > > Maker command used: mpiexec -mca btl ^openib -n 16 maker > > ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly > ** > ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) > sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 > ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly > .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate > ** > > Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. > > > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 2 15:11:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:11:19 -0600 Subject: [maker-devel] maker2.31.8 _ failure when processing repeats In-Reply-To: <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> References: <76B77664-2EA9-45AA-A1C1-1B5124DC0025@fas.harvard.edu> <01FEE9E8-69C7-4E42-9E8C-07E029BB01A5@gmail.com> <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> Message-ID: <33D1488A-E060-4760-AA99-6AB9B71EFADE@gmail.com> It will use both. It shouldn?t hurt setting both. It has more to do with expected attributes in column 8 (rm_gff i more forgiving). ?Carson > On Aug 30, 2016, at 2:31 PM, Lassance, Jean-Marc wrote: > > Let me clarify one thing: the first pass was performed with Maker running repeatMasker internally, which is why I decided to use them in the second pass, as well as the data from the independent run of RepeatMasker. > > From reading earlier posts, I gathered that Maker would use first the evidence from the rm_gff, and then from maker_gff if rm_pass=1 is activated, but that having both would not hurt. Correct? > > > JM > > > >> On Aug 30, 2016, at 4:12 PM, Carson Holt > wrote: >> >> Also make sure you pass the data in using rm_gff and not maker_gff if the repeats were not MAKER generated. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:16 AM, Daniel Ence > wrote: >>> >>> Hi Jean-Marc, so the first question I have is whether maker is still annotating repeats, even though you?re providing the rm_gff file. Are you providing a file or parameter for repeat masker in the maker_opts.ctl file? >>> >>> And secondly, what about the scaffold that is failing? How long is it, what is the percent N?s in the sequence there, and how much of it was masked in the rm_gff file? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Aug 30, 2016, at 7:19 AM, Lassance, Jean-Marc > wrote: >>>> >>>> Hi. >>>> >>>> I am using Maker2.31.8 to annotate a mammalian genome (with OpenMPI, Linux server). >>>> >>>> Basically, after running Maker a first time to generate a training set for SNAP, I am running it a second time with SNAP and Augustus enabled. Because we ran RepeatMasker independently, I am providing the gff3 like so: >>>> >>>> rm_gff=myanimal.repeatmasker.out.gff3 >>>> >>>> #-----Re-annotation Using MAKER Derived GFF3 >>>> maker_gff=myanimal.all.maker.pass1.gff >>>> rm_pass=1 >>>> >>>> Things seem to progress nicely (the vast majority of the scaffolds ?finish?), but one of the scaffolds keeps failing (I have attempted to restart after erasing the entire content of the output folder). This is the message that I could associated with this error: >>>> >>>> Died at /n/sw/fasrcsw/apps/MPI/gcc/4.8.2-fasrc01/openmpi/1.10.0-fasrc01/maker/2.31.8-fasrc01/bin/../perl/lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=26, hostname=holy2a11102.rc.fas.harvard.edu >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold00013 >>>> >>>> I wonder if you have an idea of what could be wrong here. >>>> >>>> Thanks for your help, >>>> >>>> >>>> Jean-Marc >>>> >>>> ?????????????????? >>>> Jean-Marc Lassance, PhD >>>> >>>> Harvard University >>>> Department of Organismic and Evolutionary Biology >>>> Department of Molecular and Cellular Biology >>>> Museum of Comparative Zoology >>>> >>>> 26, Oxford Street >>>> Cambridge MA 02138 >>>> USA >>>> >>>> email: lassance at fas.harvard.edu >>>> twitter: @lassancejm >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > ?????????????????? > Jean-Marc Lassance, PhD > > Harvard University > Department of Organismic and Evolutionary Biology > Department of Molecular and Cellular Biology > Museum of Comparative Zoology > > 26, Oxford Street > Cambridge MA 02138 > USA > > email: lassance at fas.harvard.edu > twitter: @lassancejm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Tue Sep 6 13:37:51 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 6 Sep 2016 15:37:51 -0400 Subject: [maker-devel] antisense RNA in training set? Message-ID: I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. Should I include these in the training set? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 6 14:09:11 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 6 Sep 2016 14:09:11 -0600 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: References: Message-ID: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> MAKER does not require input evidence to be on the correct strand because it performs splice aware alignments via Exonerate against both strands (reverse transcription for the second alignment happens internally). Exonerate should always map spliced alignments to the right strand because it is not be possible to get correct splicing on the opposite strand (splice sites are a stranded feature). The only alignments that are ambiguous are single exon alignments. They are ignored by default, but when not ignored they are stranded to the sequence with the longest canonical ORF. ?Carson > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. > > I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. > > Should I include these in the training set? > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eaalvarado at cpp.edu Tue Sep 6 15:00:59 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Tue, 6 Sep 2016 21:00:59 +0000 Subject: [maker-devel] MAKER mpi install Error Message-ID: Hello, I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: [stilllab at lettucelab src]$ ./Build clean Cleaning up build files [stilllab at lettucelab src]$ ./Build install Configuring MAKER with MPI support Had problems bootstrapping Inline module 'Parallel::Application::MPI' Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. at /usr/local/share/perl5/Inline.pm line 533. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 Do you know a workaround this problem? Thank you for your help. Regards, Emilio A. Ortiz -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 08:39:11 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 10:39:11 -0400 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> References: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> Message-ID: My organism's genome is predicted to have extremely few introns. Does that mean I should change the default alignment behavior for single exons? On Tue, Sep 6, 2016 at 4:09 PM, Carson Holt wrote: > MAKER does not require input evidence to be on the correct strand because > it performs splice aware alignments via Exonerate against both strands > (reverse transcription for the second alignment happens internally). > Exonerate should always map spliced alignments to the right strand because > it is not be possible to get correct splicing on the opposite strand > (splice sites are a stranded feature). The only alignments that are > ambiguous are single exon alignments. They are ignored by default, but when > not ignored they are stranded to the sequence with the longest canonical > ORF. > > ?Carson > > > > > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > > > I have a set of assembled transcripts from a stranded RNA seq run that I > want to use for gene finder training in a MAKER run on 'new' organism. > > > > I've noticed though that some of my assembled transcripts actually > appear to be antisense RNAs. > > > > Should I include these in the training set? > > > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 15:04:56 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 17:04:56 -0400 Subject: [maker-devel] General question about RNA evidence Message-ID: The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:19:43 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:19:43 -0600 Subject: [maker-devel] MAKER mpi install Error In-Reply-To: References: Message-ID: The error is with your OpenMPI install. It says that GLIBC does not match for /home/stilllab/.linuxbrew/lib/libmpi.so.12 You may need to reinstall. Perhaps manually. If you are using a homebrew package manager, there may be version mismatches with your system. ?Carson > On Sep 6, 2016, at 3:00 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: > > [stilllab at lettucelab src]$ ./Build clean > Cleaning up build files > [stilllab at lettucelab src]$ ./Build install > Configuring MAKER with MPI support > Had problems bootstrapping Inline module 'Parallel::Application::MPI' > > Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. > at /usr/local/share/perl5/Inline.pm line 533. > > > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. > Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 > > > Do you know a workaround this problem? Thank you for your help. > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:31:18 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:31:18 -0600 Subject: [maker-devel] General question about RNA evidence In-Reply-To: References: Message-ID: You need to assemble the reads using something like Trinity. The assembled results can be aligned to the proper strand with much greater specificity using splice aware alignments. Use the jaccard index options when running Trinity. ?Carson > On Sep 7, 2016, at 3:04 PM, Steven Sullivan wrote: > > The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. > > So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From me.mark at gmail.com Wed Sep 14 12:11:46 2016 From: me.mark at gmail.com (Mark Ebbert) Date: Wed, 14 Sep 2016 11:11:46 -0700 Subject: [maker-devel] (no subject) In-Reply-To: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> Message-ID: <57d6c562d78dd70000998701@polymail.io> Hey Carson! I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at? http://weatherby.genetics.utah.edu/MAKER/wiki/index.php ?to review detailed installation instructions (I think it?s there),?but the website is down. I appreciate your help. ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed PMPI_Comm_rank(68).: Invalid communicator =================================================================================== = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = ? PID 2462 RUNNING AT m7int02 = ? EXIT CODE: 1 = ? CLEANING UP REMAINING PROCESSES = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================" Mark T. W. Ebbert On Thu, Sep 01, 2016 at 10:03 AM Carson Holt < mailto:Carson Holt > wrote: a, pre, code, a:link, body { word-wrap: break-word !important; } MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson On Sep 1, 2016, at 10:00 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Bummer. It worked at 720 the first time. Thanks again! Mark T. W. Ebbert On Thu, Sep 01, 2016 at 9:57 AM Carson Holt ? <> wrote: -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson On Sep 1, 2016, at 8:47 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? I restarted the job and got the following segfault: ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: mpd_uncaught_except_tb handling: : list index out of range /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?264 ?pin_Uni_num if list.index(list[i]) == i: /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1449 ?pin_Cpuinfo info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1658 ?run self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?3676 ? mpd.run() /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault ? ? ?mpiexec -n 1000 maker? Any ideas? Mark T. W. Ebbert On Tue, Aug 30, 2016 at 10:54 AM Carson Holt ? <> wrote: Run 'maker -help? with mpiexec. Example: mpiexec -n 10 maker -help If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. ?Carson On Aug 30, 2016, at 10:10 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Good day everyone! I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days.? There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? Thanks! Mark T. W. Ebbert _______________________________________________ maker-devel mailing list mailto:maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 14 12:15:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Sep 2016 12:15:19 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57d6c562d78dd70000998701@polymail.io> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> <57d6c562d78dd70000998701@polymail.io> Message-ID: <0C568590-0A33-46DD-95FD-271D9A8E0009@gmail.com> Unset LD_PRELOAD. It really is only an OpenMPI issues, and may affect MPICH2 in a bad way. Also do './Build realclean? (a bit more thorough) in the source directory, then remove the ?/maker/perl directory before reinstalling. That will force reinstall of all missing perl dependancies and the perl/MPI bindings. ?Carson > On Sep 14, 2016, at 12:11 PM, Mark Ebbert wrote: > > > Hey Carson! > > I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php to review detailed installation instructions (I think it?s there), but the website is down. > > I appreciate your help. > > > ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed > PMPI_Comm_rank(68).: Invalid communicator > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 2462 RUNNING AT m7int02 > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================" > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 10:03 AM Carson Holt <> wrote: > MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. > > ?Carson > > > >> On Sep 1, 2016, at 10:00 AM, Mark Ebbert > wrote: >> >> >> Bummer. It worked at 720 the first time. Thanks again! >> >> Mark T. W. Ebbert >> >> On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: >> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. >> >> ?Carson >> >> >> >>> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >>> >>> >>> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >>> >>> I restarted the job and got the following segfault: >>> >>> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >>> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >>> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >>> mpd_uncaught_except_tb handling: >>> : list index out of range >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >>> if list.index(list[i]) == i: >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >>> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >>> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >>> mpd.run() >>> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >>> >>> Any ideas? >>> >>> Mark T. W. Ebbert >>> >>> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >>> Run 'maker -help? with mpiexec. >>> >>> Example: >>> mpiexec -n 10 maker -help >>> >>> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >>> >>> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >>> >>> ?Carson >>> >>> >>>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>>> >>>> >>>> Good day everyone! >>>> >>>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>>> >>>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>>> >>>> Thanks! >>>> >>>> Mark T. W. Ebbert >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 02:30:06 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Mon, 19 Sep 2016 18:30:06 +1000 Subject: [maker-devel] questions about post-processing of annotations Message-ID: Hi Carson, I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. Because there are no examples showing what I should expect I'm a bit lost. These are my files prior to the functional annotation. FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > And this, an example of the command I'm using maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta Thank you in advance. Xabi PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:07:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:07:59 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. Thanks, Carson > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:27:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:27:03 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). According to the tutorial the steps are: maker_map_ids map_gff_ids map_fasta_ids (for maker protein and transcripts) map_data_ids (for blast and iprs output) and then the maker_functional_* steps. So, the rename steps are before the maker_functional. Are you saying it should be the other way around? On 20 September 2016 at 08:07, Carson Holt wrote: > maker_functional_fasta reads the results of a blast report. It must be a > tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the > database and the maker fasta file as the query. If you renamed the > transcripts in the fasta before running maker_functional_fasta, the > results in the blast report will no longer match (because they have new > names). Use the map_data_ids script to fix names in the blast report if you > did that. > > Thanks, > Carson > > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial > (GMOD2014) but I think something is not right with the functional > annotation as no new information is added to the *.putative_function.* > files when I run the maker_functional_gff or the maker_functional_fasta. > All the fasta headings remain unchanged and the gff files don't show any > change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins. > renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default > it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:43:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:43:19 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. ?Carson > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). > According to the tutorial the steps are: > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt > wrote: > maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. > > Thanks, > Carson > >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:46:24 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:46:24 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. By the way, can you confirm If I need to blast the maker.transcripts.fasta? Thanks a lot On 20 September 2016 at 08:43, Carson Holt wrote: > You just have to make sure you ran the blast job report after renaming. If > you ran it before then the names in the report will not match the renamed > fasta. The blast job should be blastp (protein to protein). You can check > by just looking at the report. > > ?Carson > > > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. > I used the maker protein fasta file as query (should I do the same with the > transcripts?). > According to the tutorial the steps are: > > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt wrote: > >> maker_functional_fasta reads the results of a blast report. It must be a >> tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the >> database and the maker fasta file as the query. If you renamed the >> transcripts in the fasta before running maker_functional_fasta, the >> results in the blast report will no longer match (because they have new >> names). Use the map_data_ids script to fix names in the blast report if you >> did that. >> >> Thanks, >> Carson >> >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos >> wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial >> (GMOD2014) but I think something is not right with the functional >> annotation as no new information is added to the *.putative_function.* >> files when I run the maker_functional_gff or the maker_functional_fasta. >> All the fasta headings remain unchanged and the gff files don't show any >> change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by >> default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:50:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:50:40 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: No it is the protein to protein blast you need (which is where cross species homology occurs). Since the proteins come from the transcripts, doing a transcript translation blast (blastx in this case) would be redundant as well as less accurate because of how artifacts of a six frame translation reduce significance of the alignment. ?Carson > On Sep 19, 2016, at 4:46 PM, Xabier V?zquez Campos wrote: > > I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. > > By the way, can you confirm If I need to blast the maker.transcripts.fasta? > > Thanks a lot > > On 20 September 2016 at 08:43, Carson Holt > wrote: > You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. > > ?Carson > > >> On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: >> >> Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). >> According to the tutorial the steps are: >> maker_map_ids >> map_gff_ids >> map_fasta_ids (for maker protein and transcripts) >> map_data_ids (for blast and iprs output) >> and then the maker_functional_* steps. >> >> So, the rename steps are before the maker_functional. Are you saying it should be the other way around? >> >> >> >> >> >> On 20 September 2016 at 08:07, Carson Holt > wrote: >> maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. >> >> Thanks, >> Carson >> >>> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >>> >>> Hi Carson, >>> >>> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >>> >>> Because there are no examples showing what I should expect I'm a bit lost. >>> >>> These are my files prior to the functional annotation. >>> >>> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >>> And this, an example of the command I'm using >>> >>> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >>> >>> Thank you in advance. >>> >>> Xabi >>> >>> >>> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Mon Sep 19 22:21:18 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 00:21:18 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders Message-ID: I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? Or am I misunderstanding the workflow? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 22:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 22:34:31 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: Message-ID: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. ?Carson > On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: > > I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline > > As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. > > The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. > > But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. > > Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? > > Or am I misunderstanding the workflow? > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Sep 19 22:45:02 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 Sep 2016 04:45:02 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> Message-ID: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: > > The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. > > What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. > > After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. > > ?Carson > > > >> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >> >> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >> >> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >> >> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >> >> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >> >> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >> >> Or am I misunderstanding the workflow? >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Sep 20 13:17:24 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 Sep 2016 19:17:24 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: I can add that BUSCO did work well as a first-pass bootstrap (with the added convenience of running Augustus for generating an initial model). chris > On Sep 19, 2016, at 11:45 PM, Daniel Ence wrote: > > Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: >> >> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. >> >> What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. >> >> After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. >> >> ?Carson >> >> >> >>> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >>> >>> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >>> >>> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >>> >>> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >>> >>> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >>> >>> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >>> >>> Or am I misunderstanding the workflow? >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sullis02 at nyu.edu Tue Sep 20 13:28:20 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 15:28:20 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 20 14:15:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Sep 2016 14:15:21 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: You would need to create a custom database without the sequences you wish to exclude. ?Carson > On Sep 20, 2016, at 1:28 PM, Steven Sullivan wrote: > > Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? > > For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. > > However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? > > > > > From psh65 at cornell.edu Tue Sep 20 14:33:42 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Tue, 20 Sep 2016 20:33:42 +0000 Subject: [maker-devel] mapping cDNA to updated genome In-Reply-To: <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> References: <646B795A-1B04-4300-94C7-BEBEF0B37323@gmail.com> <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> Message-ID: <55D0187E-8C48-40DA-91BE-6370D46D041F@cornell.edu> Hi Mike and Carson, Thank you for your help. I used masked genome for aligning cDNAs. And yes, this was due to multiple aligning cDNA?s. I guess you could also filter according genes based on the alignment score from gff. I used GMAP (http://research-pub.gene.com/gmap/) to align cDNA on to the updated genome. GMAP has parameters to filter based on alignment scores and also can choose best path per cDNA. Regards, Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Aug 31, 2016, at 12:12 PM, Carson Holt > wrote: Also if you have multiple alignments of the same cDNA, you can use the score column of the mRNA feature to see which aligns best. If they have the same score, you will have to disambiguate manually or just remove all copies. ?Carson On Aug 31, 2016, at 10:10 AM, Michael Campbell > wrote: Hi Prashant, I?m almost positive that the additional genes are coming from multiply aligning cDNAs. Did you repeat mask your genome before mapping things forward? Another thought, what kind of whole genome duplications has your plant been through. it may be that the multiple alignments are to pseudogenes is some stage of decay. If that is the case it would probably be safe to keep the the gene from longest/best aligned cDNA. Thanks, Mike On Aug 31, 2016, at 10:35 AM, Prashant S Hosmani > wrote: Hi All, I am working on updating a plant genome annotation. I would like to map genes from previous annotation to a new genome build. There is a protocol about this in Campbell et al 2014, current protocols in bioinformatics (basic protocol 4 - Mapping annotations to a new assembly). I followed that protocol exactly with setting est_forward=1. But in output I?m getting large number of genes. My input cDNA fasta contains ~35K genes and after mapping there are ~58K genes. I?m using maker version 3.0. There are few changes in the genome and I?m not expecting many changes in the mapping previous genes. Please let me know if there are any other parameters to control mapping of EST?s. I was hoping to get similar number of genes mapped on to new assembly with very few changes. Thank you for your help in advance. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Thu Sep 22 10:27:53 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Thu, 22 Sep 2016 12:27:53 -0400 Subject: [maker-devel] should EST evidence be cleaned, assembled? Message-ID: Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 26 09:28:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Sep 2016 09:28:41 -0600 Subject: [maker-devel] should EST evidence be cleaned, assembled? In-Reply-To: References: Message-ID: You will want to trim the vector or any sequence not representative of the transcript or else it will not align well. The sequences will be aligned directly against the assembly. ?Carson > On Sep 22, 2016, at 10:27 AM, Steven Sullivan wrote: > > Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? > > > -- > Dr. Steven Sullivan > Center for Genomics & Systems Biology > New York University > 12 Waverly Place > New York, NY 10003 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Sep 1 09:57:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 09:57:58 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c83eeca5e7100000f39653@polymail.io> References: <57c83eeca5e7100000f39653@polymail.io> Message-ID: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson > On Sep 1, 2016, at 8:47 AM, Mark Ebbert wrote: > > > Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? > > I restarted the job and got the following segfault: > > ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. > Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. > mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: > mpd_uncaught_except_tb handling: > : list index out of range > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num > if list.index(list[i]) == i: > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo > info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run > self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 > mpd.run() > /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? > > Any ideas? > > Mark T. W. Ebbert > > On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: > Run 'maker -help? with mpiexec. > > Example: > mpiexec -n 10 maker -help > > If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. > > If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. > > ?Carson > > >> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >> >> >> Good day everyone! >> >> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >> >> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >> >> Thanks! >> >> Mark T. W. Ebbert >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 1 10:03:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 10:03:21 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c8503d99faf40000c7acab@polymail.io> References: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> <57c8503d99faf40000c7acab@polymail.io> Message-ID: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson > On Sep 1, 2016, at 10:00 AM, Mark Ebbert wrote: > > > Bummer. It worked at 720 the first time. Thanks again! > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: > -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. > > ?Carson > > > >> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >> >> >> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >> >> I restarted the job and got the following segfault: >> >> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >> mpd_uncaught_except_tb handling: >> : list index out of range >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >> if list.index(list[i]) == i: >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >> mpd.run() >> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >> >> Any ideas? >> >> Mark T. W. Ebbert >> >> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >> Run 'maker -help? with mpiexec. >> >> Example: >> mpiexec -n 10 maker -help >> >> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >> >> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>> >>> >>> Good day everyone! >>> >>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>> >>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>> >>> Thanks! >>> >>> Mark T. W. Ebbert >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eaalvarado at cpp.edu Thu Sep 1 13:44:18 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Thu, 1 Sep 2016 19:44:18 +0000 Subject: [maker-devel] MAKER Exonerate Error Message-ID: Hello, I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: Maker command used: mpiexec -mca btl ^openib -n 16 maker ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly ** ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate ** Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. Regards, Emilio A. Ortiz [linkedinbutton] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 659 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.error.log Type: application/octet-stream Size: 7891 bytes Desc: MAKER.error.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4722 bytes Desc: maker_opts.ctl URL: From carsonhh at gmail.com Fri Sep 2 15:08:50 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:08:50 -0600 Subject: [maker-devel] MAKER Exonerate Error In-Reply-To: References: Message-ID: <38035C41-1DE1-4512-B92F-AC60C182BBE8@gmail.com> This is coming from exonerate. You may need to reinstall it from source rtather than using the precompiled binaries. ?Carson > On Sep 1, 2016, at 1:44 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: > > Maker command used: mpiexec -mca btl ^openib -n 16 maker > > ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly > ** > ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) > sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 > ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly > .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate > ** > > Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. > > > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 2 15:11:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:11:19 -0600 Subject: [maker-devel] maker2.31.8 _ failure when processing repeats In-Reply-To: <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> References: <76B77664-2EA9-45AA-A1C1-1B5124DC0025@fas.harvard.edu> <01FEE9E8-69C7-4E42-9E8C-07E029BB01A5@gmail.com> <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> Message-ID: <33D1488A-E060-4760-AA99-6AB9B71EFADE@gmail.com> It will use both. It shouldn?t hurt setting both. It has more to do with expected attributes in column 8 (rm_gff i more forgiving). ?Carson > On Aug 30, 2016, at 2:31 PM, Lassance, Jean-Marc wrote: > > Let me clarify one thing: the first pass was performed with Maker running repeatMasker internally, which is why I decided to use them in the second pass, as well as the data from the independent run of RepeatMasker. > > From reading earlier posts, I gathered that Maker would use first the evidence from the rm_gff, and then from maker_gff if rm_pass=1 is activated, but that having both would not hurt. Correct? > > > JM > > > >> On Aug 30, 2016, at 4:12 PM, Carson Holt > wrote: >> >> Also make sure you pass the data in using rm_gff and not maker_gff if the repeats were not MAKER generated. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:16 AM, Daniel Ence > wrote: >>> >>> Hi Jean-Marc, so the first question I have is whether maker is still annotating repeats, even though you?re providing the rm_gff file. Are you providing a file or parameter for repeat masker in the maker_opts.ctl file? >>> >>> And secondly, what about the scaffold that is failing? How long is it, what is the percent N?s in the sequence there, and how much of it was masked in the rm_gff file? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Aug 30, 2016, at 7:19 AM, Lassance, Jean-Marc > wrote: >>>> >>>> Hi. >>>> >>>> I am using Maker2.31.8 to annotate a mammalian genome (with OpenMPI, Linux server). >>>> >>>> Basically, after running Maker a first time to generate a training set for SNAP, I am running it a second time with SNAP and Augustus enabled. Because we ran RepeatMasker independently, I am providing the gff3 like so: >>>> >>>> rm_gff=myanimal.repeatmasker.out.gff3 >>>> >>>> #-----Re-annotation Using MAKER Derived GFF3 >>>> maker_gff=myanimal.all.maker.pass1.gff >>>> rm_pass=1 >>>> >>>> Things seem to progress nicely (the vast majority of the scaffolds ?finish?), but one of the scaffolds keeps failing (I have attempted to restart after erasing the entire content of the output folder). This is the message that I could associated with this error: >>>> >>>> Died at /n/sw/fasrcsw/apps/MPI/gcc/4.8.2-fasrc01/openmpi/1.10.0-fasrc01/maker/2.31.8-fasrc01/bin/../perl/lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=26, hostname=holy2a11102.rc.fas.harvard.edu >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold00013 >>>> >>>> I wonder if you have an idea of what could be wrong here. >>>> >>>> Thanks for your help, >>>> >>>> >>>> Jean-Marc >>>> >>>> ?????????????????? >>>> Jean-Marc Lassance, PhD >>>> >>>> Harvard University >>>> Department of Organismic and Evolutionary Biology >>>> Department of Molecular and Cellular Biology >>>> Museum of Comparative Zoology >>>> >>>> 26, Oxford Street >>>> Cambridge MA 02138 >>>> USA >>>> >>>> email: lassance at fas.harvard.edu >>>> twitter: @lassancejm >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > ?????????????????? > Jean-Marc Lassance, PhD > > Harvard University > Department of Organismic and Evolutionary Biology > Department of Molecular and Cellular Biology > Museum of Comparative Zoology > > 26, Oxford Street > Cambridge MA 02138 > USA > > email: lassance at fas.harvard.edu > twitter: @lassancejm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Tue Sep 6 13:37:51 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 6 Sep 2016 15:37:51 -0400 Subject: [maker-devel] antisense RNA in training set? Message-ID: I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. Should I include these in the training set? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 6 14:09:11 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 6 Sep 2016 14:09:11 -0600 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: References: Message-ID: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> MAKER does not require input evidence to be on the correct strand because it performs splice aware alignments via Exonerate against both strands (reverse transcription for the second alignment happens internally). Exonerate should always map spliced alignments to the right strand because it is not be possible to get correct splicing on the opposite strand (splice sites are a stranded feature). The only alignments that are ambiguous are single exon alignments. They are ignored by default, but when not ignored they are stranded to the sequence with the longest canonical ORF. ?Carson > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. > > I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. > > Should I include these in the training set? > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eaalvarado at cpp.edu Tue Sep 6 15:00:59 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Tue, 6 Sep 2016 21:00:59 +0000 Subject: [maker-devel] MAKER mpi install Error Message-ID: Hello, I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: [stilllab at lettucelab src]$ ./Build clean Cleaning up build files [stilllab at lettucelab src]$ ./Build install Configuring MAKER with MPI support Had problems bootstrapping Inline module 'Parallel::Application::MPI' Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. at /usr/local/share/perl5/Inline.pm line 533. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 Do you know a workaround this problem? Thank you for your help. Regards, Emilio A. Ortiz -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 08:39:11 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 10:39:11 -0400 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> References: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> Message-ID: My organism's genome is predicted to have extremely few introns. Does that mean I should change the default alignment behavior for single exons? On Tue, Sep 6, 2016 at 4:09 PM, Carson Holt wrote: > MAKER does not require input evidence to be on the correct strand because > it performs splice aware alignments via Exonerate against both strands > (reverse transcription for the second alignment happens internally). > Exonerate should always map spliced alignments to the right strand because > it is not be possible to get correct splicing on the opposite strand > (splice sites are a stranded feature). The only alignments that are > ambiguous are single exon alignments. They are ignored by default, but when > not ignored they are stranded to the sequence with the longest canonical > ORF. > > ?Carson > > > > > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > > > I have a set of assembled transcripts from a stranded RNA seq run that I > want to use for gene finder training in a MAKER run on 'new' organism. > > > > I've noticed though that some of my assembled transcripts actually > appear to be antisense RNAs. > > > > Should I include these in the training set? > > > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 15:04:56 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 17:04:56 -0400 Subject: [maker-devel] General question about RNA evidence Message-ID: The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:19:43 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:19:43 -0600 Subject: [maker-devel] MAKER mpi install Error In-Reply-To: References: Message-ID: The error is with your OpenMPI install. It says that GLIBC does not match for /home/stilllab/.linuxbrew/lib/libmpi.so.12 You may need to reinstall. Perhaps manually. If you are using a homebrew package manager, there may be version mismatches with your system. ?Carson > On Sep 6, 2016, at 3:00 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: > > [stilllab at lettucelab src]$ ./Build clean > Cleaning up build files > [stilllab at lettucelab src]$ ./Build install > Configuring MAKER with MPI support > Had problems bootstrapping Inline module 'Parallel::Application::MPI' > > Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. > at /usr/local/share/perl5/Inline.pm line 533. > > > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. > Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 > > > Do you know a workaround this problem? Thank you for your help. > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:31:18 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:31:18 -0600 Subject: [maker-devel] General question about RNA evidence In-Reply-To: References: Message-ID: You need to assemble the reads using something like Trinity. The assembled results can be aligned to the proper strand with much greater specificity using splice aware alignments. Use the jaccard index options when running Trinity. ?Carson > On Sep 7, 2016, at 3:04 PM, Steven Sullivan wrote: > > The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. > > So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From me.mark at gmail.com Wed Sep 14 12:11:46 2016 From: me.mark at gmail.com (Mark Ebbert) Date: Wed, 14 Sep 2016 11:11:46 -0700 Subject: [maker-devel] (no subject) In-Reply-To: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> Message-ID: <57d6c562d78dd70000998701@polymail.io> Hey Carson! I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at? http://weatherby.genetics.utah.edu/MAKER/wiki/index.php ?to review detailed installation instructions (I think it?s there),?but the website is down. I appreciate your help. ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed PMPI_Comm_rank(68).: Invalid communicator =================================================================================== = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = ? PID 2462 RUNNING AT m7int02 = ? EXIT CODE: 1 = ? CLEANING UP REMAINING PROCESSES = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================" Mark T. W. Ebbert On Thu, Sep 01, 2016 at 10:03 AM Carson Holt < mailto:Carson Holt > wrote: a, pre, code, a:link, body { word-wrap: break-word !important; } MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson On Sep 1, 2016, at 10:00 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Bummer. It worked at 720 the first time. Thanks again! Mark T. W. Ebbert On Thu, Sep 01, 2016 at 9:57 AM Carson Holt ? <> wrote: -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson On Sep 1, 2016, at 8:47 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? I restarted the job and got the following segfault: ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: mpd_uncaught_except_tb handling: : list index out of range /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?264 ?pin_Uni_num if list.index(list[i]) == i: /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1449 ?pin_Cpuinfo info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1658 ?run self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?3676 ? mpd.run() /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault ? ? ?mpiexec -n 1000 maker? Any ideas? Mark T. W. Ebbert On Tue, Aug 30, 2016 at 10:54 AM Carson Holt ? <> wrote: Run 'maker -help? with mpiexec. Example: mpiexec -n 10 maker -help If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. ?Carson On Aug 30, 2016, at 10:10 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Good day everyone! I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days.? There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? Thanks! Mark T. W. Ebbert _______________________________________________ maker-devel mailing list mailto:maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 14 12:15:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Sep 2016 12:15:19 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57d6c562d78dd70000998701@polymail.io> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> <57d6c562d78dd70000998701@polymail.io> Message-ID: <0C568590-0A33-46DD-95FD-271D9A8E0009@gmail.com> Unset LD_PRELOAD. It really is only an OpenMPI issues, and may affect MPICH2 in a bad way. Also do './Build realclean? (a bit more thorough) in the source directory, then remove the ?/maker/perl directory before reinstalling. That will force reinstall of all missing perl dependancies and the perl/MPI bindings. ?Carson > On Sep 14, 2016, at 12:11 PM, Mark Ebbert wrote: > > > Hey Carson! > > I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php to review detailed installation instructions (I think it?s there), but the website is down. > > I appreciate your help. > > > ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed > PMPI_Comm_rank(68).: Invalid communicator > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 2462 RUNNING AT m7int02 > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================" > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 10:03 AM Carson Holt <> wrote: > MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. > > ?Carson > > > >> On Sep 1, 2016, at 10:00 AM, Mark Ebbert > wrote: >> >> >> Bummer. It worked at 720 the first time. Thanks again! >> >> Mark T. W. Ebbert >> >> On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: >> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. >> >> ?Carson >> >> >> >>> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >>> >>> >>> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >>> >>> I restarted the job and got the following segfault: >>> >>> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >>> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >>> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >>> mpd_uncaught_except_tb handling: >>> : list index out of range >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >>> if list.index(list[i]) == i: >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >>> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >>> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >>> mpd.run() >>> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >>> >>> Any ideas? >>> >>> Mark T. W. Ebbert >>> >>> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >>> Run 'maker -help? with mpiexec. >>> >>> Example: >>> mpiexec -n 10 maker -help >>> >>> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >>> >>> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >>> >>> ?Carson >>> >>> >>>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>>> >>>> >>>> Good day everyone! >>>> >>>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>>> >>>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>>> >>>> Thanks! >>>> >>>> Mark T. W. Ebbert >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 02:30:06 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Mon, 19 Sep 2016 18:30:06 +1000 Subject: [maker-devel] questions about post-processing of annotations Message-ID: Hi Carson, I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. Because there are no examples showing what I should expect I'm a bit lost. These are my files prior to the functional annotation. FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > And this, an example of the command I'm using maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta Thank you in advance. Xabi PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:07:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:07:59 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. Thanks, Carson > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:27:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:27:03 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). According to the tutorial the steps are: maker_map_ids map_gff_ids map_fasta_ids (for maker protein and transcripts) map_data_ids (for blast and iprs output) and then the maker_functional_* steps. So, the rename steps are before the maker_functional. Are you saying it should be the other way around? On 20 September 2016 at 08:07, Carson Holt wrote: > maker_functional_fasta reads the results of a blast report. It must be a > tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the > database and the maker fasta file as the query. If you renamed the > transcripts in the fasta before running maker_functional_fasta, the > results in the blast report will no longer match (because they have new > names). Use the map_data_ids script to fix names in the blast report if you > did that. > > Thanks, > Carson > > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial > (GMOD2014) but I think something is not right with the functional > annotation as no new information is added to the *.putative_function.* > files when I run the maker_functional_gff or the maker_functional_fasta. > All the fasta headings remain unchanged and the gff files don't show any > change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins. > renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default > it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:43:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:43:19 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. ?Carson > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). > According to the tutorial the steps are: > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt > wrote: > maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. > > Thanks, > Carson > >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:46:24 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:46:24 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. By the way, can you confirm If I need to blast the maker.transcripts.fasta? Thanks a lot On 20 September 2016 at 08:43, Carson Holt wrote: > You just have to make sure you ran the blast job report after renaming. If > you ran it before then the names in the report will not match the renamed > fasta. The blast job should be blastp (protein to protein). You can check > by just looking at the report. > > ?Carson > > > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. > I used the maker protein fasta file as query (should I do the same with the > transcripts?). > According to the tutorial the steps are: > > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt wrote: > >> maker_functional_fasta reads the results of a blast report. It must be a >> tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the >> database and the maker fasta file as the query. If you renamed the >> transcripts in the fasta before running maker_functional_fasta, the >> results in the blast report will no longer match (because they have new >> names). Use the map_data_ids script to fix names in the blast report if you >> did that. >> >> Thanks, >> Carson >> >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos >> wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial >> (GMOD2014) but I think something is not right with the functional >> annotation as no new information is added to the *.putative_function.* >> files when I run the maker_functional_gff or the maker_functional_fasta. >> All the fasta headings remain unchanged and the gff files don't show any >> change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by >> default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:50:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:50:40 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: No it is the protein to protein blast you need (which is where cross species homology occurs). Since the proteins come from the transcripts, doing a transcript translation blast (blastx in this case) would be redundant as well as less accurate because of how artifacts of a six frame translation reduce significance of the alignment. ?Carson > On Sep 19, 2016, at 4:46 PM, Xabier V?zquez Campos wrote: > > I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. > > By the way, can you confirm If I need to blast the maker.transcripts.fasta? > > Thanks a lot > > On 20 September 2016 at 08:43, Carson Holt > wrote: > You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. > > ?Carson > > >> On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: >> >> Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). >> According to the tutorial the steps are: >> maker_map_ids >> map_gff_ids >> map_fasta_ids (for maker protein and transcripts) >> map_data_ids (for blast and iprs output) >> and then the maker_functional_* steps. >> >> So, the rename steps are before the maker_functional. Are you saying it should be the other way around? >> >> >> >> >> >> On 20 September 2016 at 08:07, Carson Holt > wrote: >> maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. >> >> Thanks, >> Carson >> >>> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >>> >>> Hi Carson, >>> >>> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >>> >>> Because there are no examples showing what I should expect I'm a bit lost. >>> >>> These are my files prior to the functional annotation. >>> >>> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >>> And this, an example of the command I'm using >>> >>> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >>> >>> Thank you in advance. >>> >>> Xabi >>> >>> >>> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Mon Sep 19 22:21:18 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 00:21:18 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders Message-ID: I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? Or am I misunderstanding the workflow? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 22:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 22:34:31 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: Message-ID: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. ?Carson > On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: > > I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline > > As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. > > The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. > > But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. > > Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? > > Or am I misunderstanding the workflow? > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Sep 19 22:45:02 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 Sep 2016 04:45:02 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> Message-ID: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: > > The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. > > What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. > > After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. > > ?Carson > > > >> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >> >> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >> >> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >> >> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >> >> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >> >> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >> >> Or am I misunderstanding the workflow? >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Sep 20 13:17:24 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 Sep 2016 19:17:24 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: I can add that BUSCO did work well as a first-pass bootstrap (with the added convenience of running Augustus for generating an initial model). chris > On Sep 19, 2016, at 11:45 PM, Daniel Ence wrote: > > Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: >> >> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. >> >> What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. >> >> After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. >> >> ?Carson >> >> >> >>> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >>> >>> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >>> >>> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >>> >>> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >>> >>> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >>> >>> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >>> >>> Or am I misunderstanding the workflow? >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sullis02 at nyu.edu Tue Sep 20 13:28:20 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 15:28:20 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 20 14:15:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Sep 2016 14:15:21 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: You would need to create a custom database without the sequences you wish to exclude. ?Carson > On Sep 20, 2016, at 1:28 PM, Steven Sullivan wrote: > > Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? > > For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. > > However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? > > > > > From psh65 at cornell.edu Tue Sep 20 14:33:42 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Tue, 20 Sep 2016 20:33:42 +0000 Subject: [maker-devel] mapping cDNA to updated genome In-Reply-To: <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> References: <646B795A-1B04-4300-94C7-BEBEF0B37323@gmail.com> <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> Message-ID: <55D0187E-8C48-40DA-91BE-6370D46D041F@cornell.edu> Hi Mike and Carson, Thank you for your help. I used masked genome for aligning cDNAs. And yes, this was due to multiple aligning cDNA?s. I guess you could also filter according genes based on the alignment score from gff. I used GMAP (http://research-pub.gene.com/gmap/) to align cDNA on to the updated genome. GMAP has parameters to filter based on alignment scores and also can choose best path per cDNA. Regards, Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Aug 31, 2016, at 12:12 PM, Carson Holt > wrote: Also if you have multiple alignments of the same cDNA, you can use the score column of the mRNA feature to see which aligns best. If they have the same score, you will have to disambiguate manually or just remove all copies. ?Carson On Aug 31, 2016, at 10:10 AM, Michael Campbell > wrote: Hi Prashant, I?m almost positive that the additional genes are coming from multiply aligning cDNAs. Did you repeat mask your genome before mapping things forward? Another thought, what kind of whole genome duplications has your plant been through. it may be that the multiple alignments are to pseudogenes is some stage of decay. If that is the case it would probably be safe to keep the the gene from longest/best aligned cDNA. Thanks, Mike On Aug 31, 2016, at 10:35 AM, Prashant S Hosmani > wrote: Hi All, I am working on updating a plant genome annotation. I would like to map genes from previous annotation to a new genome build. There is a protocol about this in Campbell et al 2014, current protocols in bioinformatics (basic protocol 4 - Mapping annotations to a new assembly). I followed that protocol exactly with setting est_forward=1. But in output I?m getting large number of genes. My input cDNA fasta contains ~35K genes and after mapping there are ~58K genes. I?m using maker version 3.0. There are few changes in the genome and I?m not expecting many changes in the mapping previous genes. Please let me know if there are any other parameters to control mapping of EST?s. I was hoping to get similar number of genes mapped on to new assembly with very few changes. Thank you for your help in advance. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Thu Sep 22 10:27:53 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Thu, 22 Sep 2016 12:27:53 -0400 Subject: [maker-devel] should EST evidence be cleaned, assembled? Message-ID: Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 26 09:28:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Sep 2016 09:28:41 -0600 Subject: [maker-devel] should EST evidence be cleaned, assembled? In-Reply-To: References: Message-ID: You will want to trim the vector or any sequence not representative of the transcript or else it will not align well. The sequences will be aligned directly against the assembly. ?Carson > On Sep 22, 2016, at 10:27 AM, Steven Sullivan wrote: > > Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? > > > -- > Dr. Steven Sullivan > Center for Genomics & Systems Biology > New York University > 12 Waverly Place > New York, NY 10003 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Thu Sep 1 09:57:58 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 09:57:58 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c83eeca5e7100000f39653@polymail.io> References: <57c83eeca5e7100000f39653@polymail.io> Message-ID: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson > On Sep 1, 2016, at 8:47 AM, Mark Ebbert wrote: > > > Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? > > I restarted the job and got the following segfault: > > ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. > Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. > mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: > mpd_uncaught_except_tb handling: > : list index out of range > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num > if list.index(list[i]) == i: > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo > info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run > self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) > /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 > mpd.run() > /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? > > Any ideas? > > Mark T. W. Ebbert > > On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: > Run 'maker -help? with mpiexec. > > Example: > mpiexec -n 10 maker -help > > If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. > > If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. > > ?Carson > > >> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >> >> >> Good day everyone! >> >> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >> >> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >> >> Thanks! >> >> Mark T. W. Ebbert >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu Sep 1 10:03:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 1 Sep 2016 10:03:21 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57c8503d99faf40000c7acab@polymail.io> References: <0729B502-61AD-44C1-BE67-F3D561E11B2B@gmail.com> <57c8503d99faf40000c7acab@polymail.io> Message-ID: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson > On Sep 1, 2016, at 10:00 AM, Mark Ebbert wrote: > > > Bummer. It worked at 720 the first time. Thanks again! > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: > -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. > > ?Carson > > > >> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >> >> >> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >> >> I restarted the job and got the following segfault: >> >> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >> mpd_uncaught_except_tb handling: >> : list index out of range >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >> if list.index(list[i]) == i: >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >> mpd.run() >> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >> >> Any ideas? >> >> Mark T. W. Ebbert >> >> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >> Run 'maker -help? with mpiexec. >> >> Example: >> mpiexec -n 10 maker -help >> >> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >> >> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>> >>> >>> Good day everyone! >>> >>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>> >>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>> >>> Thanks! >>> >>> Mark T. W. Ebbert >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From eaalvarado at cpp.edu Thu Sep 1 13:44:18 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Thu, 1 Sep 2016 19:44:18 +0000 Subject: [maker-devel] MAKER Exonerate Error Message-ID: Hello, I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: Maker command used: mpiexec -mca btl ^openib -n 16 maker ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly ** ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate ** Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. Regards, Emilio A. Ortiz [linkedinbutton] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 659 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: MAKER.error.log Type: application/octet-stream Size: 7891 bytes Desc: MAKER.error.log URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 4722 bytes Desc: maker_opts.ctl URL: From carsonhh at gmail.com Fri Sep 2 15:08:50 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:08:50 -0600 Subject: [maker-devel] MAKER Exonerate Error In-Reply-To: References: Message-ID: <38035C41-1DE1-4512-B92F-AC60C182BBE8@gmail.com> This is coming from exonerate. You may need to reinstall it from source rtather than using the precompiled binaries. ?Carson > On Sep 1, 2016, at 1:44 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am currently running MAKER version 2.31.8 using MPI but I keep getting the following error when running Exonerate: > > Maker command used: mpiexec -mca btl ^openib -n 16 maker > > ** (process:18773): WARNING **: Compiled with assertion checking - will run slowly > ** > ERROR:protein2genome.c:25:Protein2Genome_Data_create: assertion failed: (target->alphabet->type == Alphabet_Type_DNA) > sh: line 1: 18771 Aborted /usr/bin/exonerate -q /media/raid/tmp/maker_DYrlgS/9/gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.for.21933-23968 > ** (process:18775): WARNING **: Compiled with assertion checking - will run slowly > .9.fasta -t /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.9.fasta -Q protein -T dna -m protein2genome --softmasktarget --percent 20 --showcigar > /media/raid/tmp/maker_DYrlgS/9/13225915.21933-23968.gi%7C565342117%7Cref%7CXP_006338208%2E1%7C.p.exonerate > ** > > Attached is the Error log and the maker_opts.ctl file. Do you know a workaround this problem? I would really appreciate your help. > > > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 2 15:11:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 2 Sep 2016 15:11:19 -0600 Subject: [maker-devel] maker2.31.8 _ failure when processing repeats In-Reply-To: <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> References: <76B77664-2EA9-45AA-A1C1-1B5124DC0025@fas.harvard.edu> <01FEE9E8-69C7-4E42-9E8C-07E029BB01A5@gmail.com> <1F3B92BC-1717-4CB5-A26D-6F2126667E53@fas.harvard.edu> Message-ID: <33D1488A-E060-4760-AA99-6AB9B71EFADE@gmail.com> It will use both. It shouldn?t hurt setting both. It has more to do with expected attributes in column 8 (rm_gff i more forgiving). ?Carson > On Aug 30, 2016, at 2:31 PM, Lassance, Jean-Marc wrote: > > Let me clarify one thing: the first pass was performed with Maker running repeatMasker internally, which is why I decided to use them in the second pass, as well as the data from the independent run of RepeatMasker. > > From reading earlier posts, I gathered that Maker would use first the evidence from the rm_gff, and then from maker_gff if rm_pass=1 is activated, but that having both would not hurt. Correct? > > > JM > > > >> On Aug 30, 2016, at 4:12 PM, Carson Holt > wrote: >> >> Also make sure you pass the data in using rm_gff and not maker_gff if the repeats were not MAKER generated. >> >> ?Carson >> >> >>> On Aug 30, 2016, at 10:16 AM, Daniel Ence > wrote: >>> >>> Hi Jean-Marc, so the first question I have is whether maker is still annotating repeats, even though you?re providing the rm_gff file. Are you providing a file or parameter for repeat masker in the maker_opts.ctl file? >>> >>> And secondly, what about the scaffold that is failing? How long is it, what is the percent N?s in the sequence there, and how much of it was masked in the rm_gff file? >>> >>> Thanks, >>> Daniel >>> >>> >>> Daniel Ence >>> Graduate Student >>> Eccles Institute of Human Genetics >>> University of Utah >>> 15 North 2030 East, Room 2100 >>> Salt Lake City, UT 84112-5330 >>> >>>> On Aug 30, 2016, at 7:19 AM, Lassance, Jean-Marc > wrote: >>>> >>>> Hi. >>>> >>>> I am using Maker2.31.8 to annotate a mammalian genome (with OpenMPI, Linux server). >>>> >>>> Basically, after running Maker a first time to generate a training set for SNAP, I am running it a second time with SNAP and Augustus enabled. Because we ran RepeatMasker independently, I am providing the gff3 like so: >>>> >>>> rm_gff=myanimal.repeatmasker.out.gff3 >>>> >>>> #-----Re-annotation Using MAKER Derived GFF3 >>>> maker_gff=myanimal.all.maker.pass1.gff >>>> rm_pass=1 >>>> >>>> Things seem to progress nicely (the vast majority of the scaffolds ?finish?), but one of the scaffolds keeps failing (I have attempted to restart after erasing the entire content of the output folder). This is the message that I could associated with this error: >>>> >>>> Died at /n/sw/fasrcsw/apps/MPI/gcc/4.8.2-fasrc01/openmpi/1.10.0-fasrc01/maker/2.31.8-fasrc01/bin/../perl/lib/Bio/Search/Hit/PhatHit/Base.pm line 188. >>>> --> rank=26, hostname=holy2a11102.rc.fas.harvard.edu >>>> ERROR: Failed while processing all repeats >>>> ERROR: Chunk failed at level:3, tier_type:1 >>>> FAILED CONTIG:scaffold00013 >>>> >>>> I wonder if you have an idea of what could be wrong here. >>>> >>>> Thanks for your help, >>>> >>>> >>>> Jean-Marc >>>> >>>> ?????????????????? >>>> Jean-Marc Lassance, PhD >>>> >>>> Harvard University >>>> Department of Organismic and Evolutionary Biology >>>> Department of Molecular and Cellular Biology >>>> Museum of Comparative Zoology >>>> >>>> 26, Oxford Street >>>> Cambridge MA 02138 >>>> USA >>>> >>>> email: lassance at fas.harvard.edu >>>> twitter: @lassancejm >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > ?????????????????? > Jean-Marc Lassance, PhD > > Harvard University > Department of Organismic and Evolutionary Biology > Department of Molecular and Cellular Biology > Museum of Comparative Zoology > > 26, Oxford Street > Cambridge MA 02138 > USA > > email: lassance at fas.harvard.edu > twitter: @lassancejm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Tue Sep 6 13:37:51 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 6 Sep 2016 15:37:51 -0400 Subject: [maker-devel] antisense RNA in training set? Message-ID: I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. Should I include these in the training set? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 6 14:09:11 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 6 Sep 2016 14:09:11 -0600 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: References: Message-ID: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> MAKER does not require input evidence to be on the correct strand because it performs splice aware alignments via Exonerate against both strands (reverse transcription for the second alignment happens internally). Exonerate should always map spliced alignments to the right strand because it is not be possible to get correct splicing on the opposite strand (splice sites are a stranded feature). The only alignments that are ambiguous are single exon alignments. They are ignored by default, but when not ignored they are stranded to the sequence with the longest canonical ORF. ?Carson > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > I have a set of assembled transcripts from a stranded RNA seq run that I want to use for gene finder training in a MAKER run on 'new' organism. > > I've noticed though that some of my assembled transcripts actually appear to be antisense RNAs. > > Should I include these in the training set? > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From eaalvarado at cpp.edu Tue Sep 6 15:00:59 2016 From: eaalvarado at cpp.edu (Emilio A. Alvarado Ortiz) Date: Tue, 6 Sep 2016 21:00:59 +0000 Subject: [maker-devel] MAKER mpi install Error Message-ID: Hello, I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: [stilllab at lettucelab src]$ ./Build clean Cleaning up build files [stilllab at lettucelab src]$ ./Build install Configuring MAKER with MPI support Had problems bootstrapping Inline module 'Parallel::Application::MPI' Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. at /usr/local/share/perl5/Inline.pm line 533. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 Do you know a workaround this problem? Thank you for your help. Regards, Emilio A. Ortiz -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 08:39:11 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 10:39:11 -0400 Subject: [maker-devel] antisense RNA in training set? In-Reply-To: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> References: <3C4368E1-605E-4C65-88B2-9CF57E1CAA15@gmail.com> Message-ID: My organism's genome is predicted to have extremely few introns. Does that mean I should change the default alignment behavior for single exons? On Tue, Sep 6, 2016 at 4:09 PM, Carson Holt wrote: > MAKER does not require input evidence to be on the correct strand because > it performs splice aware alignments via Exonerate against both strands > (reverse transcription for the second alignment happens internally). > Exonerate should always map spliced alignments to the right strand because > it is not be possible to get correct splicing on the opposite strand > (splice sites are a stranded feature). The only alignments that are > ambiguous are single exon alignments. They are ignored by default, but when > not ignored they are stranded to the sequence with the longest canonical > ORF. > > ?Carson > > > > > On Sep 6, 2016, at 1:37 PM, Steven Sullivan wrote: > > > > I have a set of assembled transcripts from a stranded RNA seq run that I > want to use for gene finder training in a MAKER run on 'new' organism. > > > > I've noticed though that some of my assembled transcripts actually > appear to be antisense RNAs. > > > > Should I include these in the training set? > > > > > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Wed Sep 7 15:04:56 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Wed, 7 Sep 2016 17:04:56 -0400 Subject: [maker-devel] General question about RNA evidence Message-ID: The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:19:43 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:19:43 -0600 Subject: [maker-devel] MAKER mpi install Error In-Reply-To: References: Message-ID: The error is with your OpenMPI install. It says that GLIBC does not match for /home/stilllab/.linuxbrew/lib/libmpi.so.12 You may need to reinstall. Perhaps manually. If you are using a homebrew package manager, there may be version mismatches with your system. ?Carson > On Sep 6, 2016, at 3:00 PM, Emilio A. Alvarado Ortiz wrote: > > Hello, > > I am trying to install MAKER with Mpi on a Scientific Linux machine but I keep getting the following error: > > [stilllab at lettucelab src]$ ./Build clean > Cleaning up build files > [stilllab at lettucelab src]$ ./Build install > Configuring MAKER with MPI support > Had problems bootstrapping Inline module 'Parallel::Application::MPI' > > Can't load '/home/stilllab/Documents/maker/src/blib/lib/auto/Parallel/Application/MPI/MPI.so' for module Parallel::Application::MPI: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/stilllab/.linuxbrew/lib/libmpi.so.12) at /usr/lib64/perl5/DynaLoader.pm line 200. > at /usr/local/share/perl5/Inline.pm line 533. > > > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 236. > at /home/stilllab/Documents/maker/src/../perl/lib/Parallel/Application/MPI.pm line 256. > Parallel::Application::MPI::_bind("/home/stilllab/mpich3/bin/mpicc", "/home/stilllab/mpich3/include", "blib", "") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 277 > MAKER::Build::ACTION_build(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "build") called at /usr/local/share/perl5/Module/Build/Base.pm line 1993 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0), "build") called at /home/stilllab/Documents/maker/src/inc/lib/MAKER/Build.pm line 469 > MAKER::Build::ACTION_install(MAKER::Build=HASH(0x1f4faa0)) called at /usr/local/share/perl5/Module/Build/Base.pm line 2010 > Module::Build::Base::_call_action(MAKER::Build=HASH(0x1f4faa0), "install") called at /usr/local/share/perl5/Module/Build/Base.pm line 1998 > Module::Build::Base::dispatch(MAKER::Build=HASH(0x1f4faa0)) called at ./Build line 62 > > > Do you know a workaround this problem? Thank you for your help. > > Regards, > > Emilio A. Ortiz > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 7 16:31:18 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 7 Sep 2016 16:31:18 -0600 Subject: [maker-devel] General question about RNA evidence In-Reply-To: References: Message-ID: You need to assemble the reads using something like Trinity. The assembled results can be aligned to the proper strand with much greater specificity using splice aware alignments. Use the jaccard index options when running Trinity. ?Carson > On Sep 7, 2016, at 3:04 PM, Steven Sullivan wrote: > > The MAKER documentation I can access (wiki turorials) seems somewhat out of date as regards RNA evidence , as it focuses a lot on ESTs, whereas today RNA seq data would likely be more common. > > So a general question I have is, for a new eukaryotic organism with no models, is it better to use assembled RNA seq reads (i.e., putative transcripts generated by Trinity) as input to 1) ab initio predictors and as 2) MAKER alignment evidence, or is it better to use the reads themselves, unassembled? > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From me.mark at gmail.com Wed Sep 14 12:11:46 2016 From: me.mark at gmail.com (Mark Ebbert) Date: Wed, 14 Sep 2016 11:11:46 -0700 Subject: [maker-devel] (no subject) In-Reply-To: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> Message-ID: <57d6c562d78dd70000998701@polymail.io> Hey Carson! I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at? http://weatherby.genetics.utah.edu/MAKER/wiki/index.php ?to review detailed installation instructions (I think it?s there),?but the website is down. I appreciate your help. ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed PMPI_Comm_rank(68).: Invalid communicator Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed PMPI_Comm_rank(68).: Invalid communicator =================================================================================== = ? BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = ? PID 2462 RUNNING AT m7int02 = ? EXIT CODE: 1 = ? CLEANING UP REMAINING PROCESSES = ? YOU CAN IGNORE THE BELOW CLEANUP MESSAGES ===================================================================================" Mark T. W. Ebbert On Thu, Sep 01, 2016 at 10:03 AM Carson Holt < mailto:Carson Holt > wrote: a, pre, code, a:link, body { word-wrap: break-word !important; } MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. ?Carson On Sep 1, 2016, at 10:00 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Bummer. It worked at 720 the first time. Thanks again! Mark T. W. Ebbert On Thu, Sep 01, 2016 at 9:57 AM Carson Holt ? <> wrote: -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. ?Carson On Sep 1, 2016, at 8:47 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? I restarted the job and got the following segfault: ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: mpd_uncaught_except_tb handling: : list index out of range /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?264 ?pin_Uni_num if list.index(list[i]) == i: /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1449 ?pin_Cpuinfo info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?1658 ?run self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py ?3676 ? mpd.run() /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault ? ? ?mpiexec -n 1000 maker? Any ideas? Mark T. W. Ebbert On Tue, Aug 30, 2016 at 10:54 AM Carson Holt ? <> wrote: Run 'maker -help? with mpiexec. Example: mpiexec -n 10 maker -help If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. ?Carson On Aug 30, 2016, at 10:10 AM, Mark Ebbert < mailto:me.mark at gmail.com > wrote: Good day everyone! I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days.? There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? Thanks! Mark T. W. Ebbert _______________________________________________ maker-devel mailing list mailto:maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 14 12:15:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 Sep 2016 12:15:19 -0600 Subject: [maker-devel] (no subject) In-Reply-To: <57d6c562d78dd70000998701@polymail.io> References: <544D887E-7BEB-4E3F-B3B7-A62AF7F27899@gmail.com> <57d6c562d78dd70000998701@polymail.io> Message-ID: <0C568590-0A33-46DD-95FD-271D9A8E0009@gmail.com> Unset LD_PRELOAD. It really is only an OpenMPI issues, and may affect MPICH2 in a bad way. Also do './Build realclean? (a bit more thorough) in the source directory, then remove the ?/maker/perl directory before reinstalling. That will force reinstall of all missing perl dependancies and the perl/MPI bindings. ?Carson > On Sep 14, 2016, at 12:11 PM, Mark Ebbert wrote: > > > Hey Carson! > > I?m getting a new issue. I think I need to recompile Maker with MPICH instead of openmpi. I?m getting the following errors when I try to run ?mpiexec -n 10 maker -help?. I tried running ?./Build clean? followed by ?./Build install? after updated LD_PRELOAD with the path to MPICH, but I?m still getting the error. I was also trying to access Maker documentation at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php to review detailed installation instructions (I think it?s there), but the website is down. > > I appreciate your help. > > > ?Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xa0a5d620, rank=0x7ffd20bb8d9c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x51f83620, rank=0x7ffc6023b7fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8b342620, rank=0x7ffde14f02fc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xf8f24620, rank=0x7ffe71c9a5bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x8c074620, rank=0x7ffc70e50b6c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xdac15620, rank=0x7ffc67bf0e2c) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xbb65620, rank=0x7ffc17a1d1bc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0x2aa3b620, rank=0x7fff551201dc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xd2453620, rank=0x7fffaebe21cc) failed > PMPI_Comm_rank(68).: Invalid communicator > Fatal error in PMPI_Comm_rank: Invalid communicator, error stack: > PMPI_Comm_rank(110): MPI_Comm_rank(comm=0xb24e8620, rank=0x7ffdd838bbfc) failed > PMPI_Comm_rank(68).: Invalid communicator > > =================================================================================== > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 2462 RUNNING AT m7int02 > = EXIT CODE: 1 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > ===================================================================================" > > Mark T. W. Ebbert > > On Thu, Sep 01, 2016 at 10:03 AM Carson Holt <> wrote: > MAKER will use locks to divide up work between simultaneously running jobs. So submitting five 200 CPU jobs, will give you the same throughput, and will be more stable. The jobs will probably move through the queue faster as well. > > ?Carson > > > >> On Sep 1, 2016, at 10:00 AM, Mark Ebbert > wrote: >> >> >> Bummer. It worked at 720 the first time. Thanks again! >> >> Mark T. W. Ebbert >> >> On Thu, Sep 01, 2016 at 9:57 AM Carson Holt <> wrote: >> -n 1000 is probably too high for mpich3. It?s communication manager is not that robust. You can go that high with OpenMPI or MVAPICH2, but I?ve found that MPICH3 tops out at 100-200. Just submit multiple jobs at the lower count. >> >> ?Carson >> >> >> >>> On Sep 1, 2016, at 8:47 AM, Mark Ebbert > wrote: >>> >>> >>> Thanks Carson! The help message only printed once, so everything seemed fine. I deleted all of the lock files with the following command: ?find . -name *.NFSLock* -exec rm {} \;? >>> >>> I restarted the job and got the following segfault: >>> >>> ?Module mpi/mpich-3.1.4_intel-15.0.3 requires compiler_intel/15.0.3. Loading it now. >>> Module compiler_intel/15.0.3 requires mkl/11.2.0. Loading it now. >>> mpdboot_m7-1-2 (handle_mpd_output 1000): from mpd on m7-1-2, invalid port info: >>> mpd_uncaught_except_tb handling: >>> : list index out of range >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 264 pin_Uni_num >>> if list.index(list[i]) == i: >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1449 pin_Cpuinfo >>> info['cache1'] = pin_Uni_num(info['cache1_id'], info['lcpu']) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 1658 run >>> self.CpuInfo = pin_Cpuinfo(self.PinCase,self.Arch) >>> /apps/intel_parallel_studio_xe/2015_update3/mpirt/bin/intel64/mpd.py 3676 >>> mpd.run() >>> /var/spool/slurmd/job11326444/slurm_script: line 27: 29365 Segmentation fault mpiexec -n 1000 maker? >>> >>> Any ideas? >>> >>> Mark T. W. Ebbert >>> >>> On Tue, Aug 30, 2016 at 10:54 AM Carson Holt <> wrote: >>> Run 'maker -help? with mpiexec. >>> >>> Example: >>> mpiexec -n 10 maker -help >>> >>> If the MPI communication ring is working correctly, then it will print the help message only once (from the root process). If it is not working, it will print the help message 10 time because each of the 10 MPI processes will think they are the root process. It is a simple test that can identify if it is an MPI issue or not. >>> >>> If it is not an MPI issue, you can just search for the NFSLock files using find and delete them,. >>> >>> ?Carson >>> >>> >>>> On Aug 30, 2016, at 10:10 AM, Mark Ebbert > wrote: >>>> >>>> >>>> Good day everyone! >>>> >>>> I?m getting the error stating: ?WARNING: Multiple MAKER processes have been started in the same directory.? Everything I?ve seen mentions version issues with MPICH. The difference in my situation is that my initial run ran just fine, but died because of the cluster time constraints. We?re only allowed 3 days. >>>> >>>> There are a bunch of .NFSLock files in the output directory. I?m guessing Maker wasn?t able to clear the locks when the jobs died? Can I safely delete those lock files? What?s the best way to handle this going forward since I can only run jobs for 3 days at a time? >>>> >>>> Thanks! >>>> >>>> Mark T. W. Ebbert >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 02:30:06 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Mon, 19 Sep 2016 18:30:06 +1000 Subject: [maker-devel] questions about post-processing of annotations Message-ID: Hi Carson, I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. Because there are no examples showing what I should expect I'm a bit lost. These are my files prior to the functional annotation. FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > And this, an example of the command I'm using maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta Thank you in advance. Xabi PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:07:59 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:07:59 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. Thanks, Carson > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv > FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.transcripts.renamed.fasta > FRL.all.maker.trnascan.transcripts.renamed.fasta > FRL.all.renamed.gff > FRL.map > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:27:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:27:03 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). According to the tutorial the steps are: maker_map_ids map_gff_ids map_fasta_ids (for maker protein and transcripts) map_data_ids (for blast and iprs output) and then the maker_functional_* steps. So, the rename steps are before the maker_functional. Are you saying it should be the other way around? On 20 September 2016 at 08:07, Carson Holt wrote: > maker_functional_fasta reads the results of a blast report. It must be a > tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the > database and the maker fasta file as the query. If you renamed the > transcripts in the fasta before running maker_functional_fasta, the > results in the blast report will no longer match (because they have new > names). Use the map_data_ids script to fix names in the blast report if you > did that. > > Thanks, > Carson > > On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: > > Hi Carson, > > I'm trying to go through the post processing step in the tutorial > (GMOD2014) but I think something is not right with the functional > annotation as no new information is added to the *.putative_function.* > files when I run the maker_functional_gff or the maker_functional_fasta. > All the fasta headings remain unchanged and the gff files don't show any > change. I'm using Maker 2.31.6 by the way. > > Because there are no examples showing what I should expect I'm a bit lost. > > These are my files prior to the functional annotation. > > FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> > > And this, an example of the command I'm using > > maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv > FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins. > renamed.putative_function.fasta > > Thank you in advance. > > Xabi > > > PS: the tutorial mentions to use the "standard" IPRS output but by default > it gives xml, gff3 and tsv files. Which one should I use? > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:43:19 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:43:19 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: Message-ID: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. ?Carson > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). > According to the tutorial the steps are: > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt > wrote: > maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. > > Thanks, > Carson > >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta >> FRL.all.maker.transcripts.renamed.fasta >> FRL.all.maker.trnascan.transcripts.renamed.fasta >> FRL.all.renamed.gff >> FRL.map >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Mon Sep 19 16:46:24 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Tue, 20 Sep 2016 08:46:24 +1000 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. By the way, can you confirm If I need to blast the maker.transcripts.fasta? Thanks a lot On 20 September 2016 at 08:43, Carson Holt wrote: > You just have to make sure you ran the blast job report after renaming. If > you ran it before then the names in the report will not match the renamed > fasta. The blast job should be blastp (protein to protein). You can check > by just looking at the report. > > ?Carson > > > On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: > > Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. > I used the maker protein fasta file as query (should I do the same with the > transcripts?). > According to the tutorial the steps are: > > maker_map_ids > map_gff_ids > map_fasta_ids (for maker protein and transcripts) > map_data_ids (for blast and iprs output) > > and then the maker_functional_* steps. > > So, the rename steps are before the maker_functional. Are you saying it should be the other way around? > > > > > > On 20 September 2016 at 08:07, Carson Holt wrote: > >> maker_functional_fasta reads the results of a blast report. It must be a >> tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the >> database and the maker fasta file as the query. If you renamed the >> transcripts in the fasta before running maker_functional_fasta, the >> results in the blast report will no longer match (because they have new >> names). Use the map_data_ids script to fix names in the blast report if you >> did that. >> >> Thanks, >> Carson >> >> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos >> wrote: >> >> Hi Carson, >> >> I'm trying to go through the post processing step in the tutorial >> (GMOD2014) but I think something is not right with the functional >> annotation as no new information is added to the *.putative_function.* >> files when I run the maker_functional_gff or the maker_functional_fasta. >> All the fasta headings remain unchanged and the gff files don't show any >> change. I'm using Maker 2.31.6 by the way. >> >> Because there are no examples showing what I should expect I'm a bit lost. >> >> These are my files prior to the functional annotation. >> >> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >> >> And this, an example of the command I'm using >> >> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv >> FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >> >> >> Thank you in advance. >> >> Xabi >> >> >> PS: the tutorial mentions to use the "standard" IPRS output but by >> default it gives xml, gff3 and tsv files. Which one should I use? >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 16:50:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 16:50:40 -0600 Subject: [maker-devel] questions about post-processing of annotations In-Reply-To: References: <7EC9517E-5284-4872-BD3D-E313A7F6E09A@gmail.com> Message-ID: No it is the protein to protein blast you need (which is where cross species homology occurs). Since the proteins come from the transcripts, doing a transcript translation blast (blastx in this case) would be redundant as well as less accurate because of how artifacts of a six frame translation reduce significance of the alignment. ?Carson > On Sep 19, 2016, at 4:46 PM, Xabier V?zquez Campos wrote: > > I see. I ran blastp before starting the "post-processing of annotations" step. I guess I should do the same with IPRS. > > By the way, can you confirm If I need to blast the maker.transcripts.fasta? > > Thanks a lot > > On 20 September 2016 at 08:43, Carson Holt > wrote: > You just have to make sure you ran the blast job report after renaming. If you ran it before then the names in the report will not match the renamed fasta. The blast job should be blastp (protein to protein). You can check by just looking at the report. > > ?Carson > > >> On Sep 19, 2016, at 4:27 PM, Xabier V?zquez Campos > wrote: >> >> Yes, my blast output is -outfmt 6 using the Uniprot/Swissprot as database. I used the maker protein fasta file as query (should I do the same with the transcripts?). >> According to the tutorial the steps are: >> maker_map_ids >> map_gff_ids >> map_fasta_ids (for maker protein and transcripts) >> map_data_ids (for blast and iprs output) >> and then the maker_functional_* steps. >> >> So, the rename steps are before the maker_functional. Are you saying it should be the other way around? >> >> >> >> >> >> On 20 September 2016 at 08:07, Carson Holt > wrote: >> maker_functional_fasta reads the results of a blast report. It must be a tab delimted blast report (-outfmt 6 under BLAST+) with unpirot as the database and the maker fasta file as the query. If you renamed the transcripts in the fasta before running maker_functional_fasta, the results in the blast report will no longer match (because they have new names). Use the map_data_ids script to fix names in the blast report if you did that. >> >> Thanks, >> Carson >> >>> On Sep 19, 2016, at 2:30 AM, Xabier V?zquez Campos > wrote: >>> >>> Hi Carson, >>> >>> I'm trying to go through the post processing step in the tutorial (GMOD2014) but I think something is not right with the functional annotation as no new information is added to the *.putative_function.* files when I run the maker_functional_gff or the maker_functional_fasta. All the fasta headings remain unchanged and the gff files don't show any change. I'm using Maker 2.31.6 by the way. >>> >>> Because there are no examples showing what I should expect I'm a bit lost. >>> >>> These are my files prior to the functional annotation. >>> >>> FRL.all.iprscan.renamed.tsv >>> FRL.all.maker.proteins.blastout.sprot.renamed.tsv >>> FRL.all.maker.proteins.renamed.fasta >>> FRL.all.maker.transcripts.renamed.fasta >>> FRL.all.maker.trnascan.transcripts.renamed.fasta >>> FRL.all.renamed.gff >>> FRL.map >>> >>> And this, an example of the command I'm using >>> >>> maker_functional_fasta uniprot_sprot.fasta FRL.all.maker.proteins.blastout.sprot.renamed.tsv FRL.all.maker.proteins.renamed.fasta > FRL.all.maker.proteins.renamed.putative_function.fasta >>> >>> Thank you in advance. >>> >>> Xabi >>> >>> >>> PS: the tutorial mentions to use the "standard" IPRS output but by default it gives xml, gff3 and tsv files. Which one should I use? >>> >>> -- >>> Xabier V?zquez-Campos, PhD >>> Research Associate >>> Water Research Centre >>> School of Civil and Environmental Engineering >>> The University of New South Wales >>> Sydney NSW 2052 AUSTRALIA >> >> >> >> >> -- >> Xabier V?zquez-Campos, PhD >> Research Associate >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Mon Sep 19 22:21:18 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 00:21:18 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders Message-ID: I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? Or am I misunderstanding the workflow? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 19 22:34:31 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 19 Sep 2016 22:34:31 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: Message-ID: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. ?Carson > On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: > > I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline > > As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. > > The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. > > But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. > > Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? > > Or am I misunderstanding the workflow? > > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From dence at genetics.utah.edu Mon Sep 19 22:45:02 2016 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 Sep 2016 04:45:02 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> Message-ID: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. ~Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: > > The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. > > What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. > > After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. > > ?Carson > > > >> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >> >> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >> >> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >> >> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >> >> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >> >> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >> >> Or am I misunderstanding the workflow? >> >> >> >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From cjfields at illinois.edu Tue Sep 20 13:17:24 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 Sep 2016 19:17:24 +0000 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: I can add that BUSCO did work well as a first-pass bootstrap (with the added convenience of running Augustus for generating an initial model). chris > On Sep 19, 2016, at 11:45 PM, Daniel Ence wrote: > > Just chiming in with my own perspective on the question. The gold-standard genes can be used as input for training the gene predictors and also as evidence for the genome annotation. Presumably, you?ll have much more evidence than the gold-standard genes for the annotation, so it won?t be circular. As Carson said, the gene predictors are using the structure of the alignments of the input, rather than the sequence itself. The other source for input for gene predictors, in the case of a true bootstrap where you have no gold-standard, would be to use alignment generated by a program, like BUSCO or CEGMA, that identifies conserved orthologs in the genome. > > ~Daniel > > > > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 19, 2016, at 10:34 PM, Carson Holt wrote: >> >> The training does not involve so much the sequence, rather the structure (i.e. intron exon, start, stop etc.). You could use the evidence deposited as input to the iterative process described, but not directly. This is because you have the sequence but not the structure. >> >> What MAKER does with the est2genome/protein2genome options is to align the evidence to the reference, polish for correct splicing (because blast alignments are not splice aware), then identify correct open reading frames with start and stop codons. The result is an intron/exon structure. The HMM for the predictor then builds probability models for moving from intron to exon states (which includes info such as leading sequence before the start codons, average intron lengths, etc.). All of which is not directly available from the protein or transcript data. But once it?s been polished against the reference, the structure can be discovered. >> >> After initial training (i.e. the bootstrap run), MAKER provides hints in the form of probability bonuses when evidence alignments suggest UTR, CDS, intron, or exon. Then when the predictors run, they perform better than they would without the hint. As a result the second round of predictions are better than the first, and can be used as training to improve the HMM. >> >> ?Carson >> >> >> >>> On Sep 19, 2016, at 10:21 PM, Steven Sullivan wrote: >>> >>> I'm confused about the use(s) of gene sequence evidence in the MAKER de novo annotation pipeline >>> >>> As I understand it, MAKER combines 1) its own BLAST alignments of user-supplied RNA ('EST evidence') and protein ('protein homology evidence') sequences to the genome assembly, with 2) models suggested by trained ab initio gene finders that run in parallel. >>> >>> The gene finders require a prior training step, and the training sub-protocol in Campbell et al 2014 (Curr. Prot. Bioinf.) assumes that no 'gold standard' gene annotation exist for a newly-sequenced genome. Therefore it describes an iterative/bootstrap process whereby initial MAKER output becomes the gene finder training input for e.g. SNAP, whose output is then used in the next MAKER round. >>> >>> But in my case, even before the genome was sequenced, a few hundred individual high-quality DNA/protein gene sequences for my species have already been deposited in public databases (Genbank, Swissprot) by various labs over the years, to accompany various publications. >>> >>> Should these be used to train gene finders prior to a MAKER run, and *also* as user-supplied 'protein homology evidence' to MAKER itself? >>> >>> Or am I misunderstanding the workflow? >>> >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From sullis02 at nyu.edu Tue Sep 20 13:28:20 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Tue, 20 Sep 2016 15:28:20 -0400 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 20 14:15:21 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 Sep 2016 14:15:21 -0600 Subject: [maker-devel] evidence for MAKER vs evidence to train gene finders In-Reply-To: References: <96AEFFD4-E97A-4241-82AF-E283DFF6DB20@gmail.com> <5504084F-07AE-4FCF-97BE-EF7F5EF4D371@genetics.utah.edu> Message-ID: You would need to create a custom database without the sequences you wish to exclude. ?Carson > On Sep 20, 2016, at 1:28 PM, Steven Sullivan wrote: > > Thanks! So, I think for training the gene predictors, I'll try to identify any sequences in my gold-standard set that have structural in information...i.e. genes for which the genomic sequence was cloned....and use those. But I doubt there's enough of those to train e.g. Augustus, so I'll probably have to use the bootstrap method as well . Is there a way to combine both? > > For the BLAST-based annotation, if I use entire Uniprot/Swissprot or Genbank FASTA sets as protein homology evidence , my gold standards are already included in those. I gather from these replies that that's not a problem. > > However, there *are* public database sequences (predicted genes from an older annotation of this species) that I *do* want to exclude from evidence. (Because we want to run MAKER as if this genome was 'new', never before annotated.) Can I use something like the -negative_gilist option in blastp , to omit previous genome project predictions from consideration? (An option that only works with Genbank sequences, I think) . Or do I have to create a custom version of the large public database? > > > > > From psh65 at cornell.edu Tue Sep 20 14:33:42 2016 From: psh65 at cornell.edu (Prashant S Hosmani) Date: Tue, 20 Sep 2016 20:33:42 +0000 Subject: [maker-devel] mapping cDNA to updated genome In-Reply-To: <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> References: <646B795A-1B04-4300-94C7-BEBEF0B37323@gmail.com> <9FBCB1C4-C319-4933-8741-53DAFCB82458@gmail.com> Message-ID: <55D0187E-8C48-40DA-91BE-6370D46D041F@cornell.edu> Hi Mike and Carson, Thank you for your help. I used masked genome for aligning cDNAs. And yes, this was due to multiple aligning cDNA?s. I guess you could also filter according genes based on the alignment score from gff. I used GMAP (http://research-pub.gene.com/gmap/) to align cDNA on to the updated genome. GMAP has parameters to filter based on alignment scores and also can choose best path per cDNA. Regards, Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA On Aug 31, 2016, at 12:12 PM, Carson Holt > wrote: Also if you have multiple alignments of the same cDNA, you can use the score column of the mRNA feature to see which aligns best. If they have the same score, you will have to disambiguate manually or just remove all copies. ?Carson On Aug 31, 2016, at 10:10 AM, Michael Campbell > wrote: Hi Prashant, I?m almost positive that the additional genes are coming from multiply aligning cDNAs. Did you repeat mask your genome before mapping things forward? Another thought, what kind of whole genome duplications has your plant been through. it may be that the multiple alignments are to pseudogenes is some stage of decay. If that is the case it would probably be safe to keep the the gene from longest/best aligned cDNA. Thanks, Mike On Aug 31, 2016, at 10:35 AM, Prashant S Hosmani > wrote: Hi All, I am working on updating a plant genome annotation. I would like to map genes from previous annotation to a new genome build. There is a protocol about this in Campbell et al 2014, current protocols in bioinformatics (basic protocol 4 - Mapping annotations to a new assembly). I followed that protocol exactly with setting est_forward=1. But in output I?m getting large number of genes. My input cDNA fasta contains ~35K genes and after mapping there are ~58K genes. I?m using maker version 3.0. There are few changes in the genome and I?m not expecting many changes in the mapping previous genes. Please let me know if there are any other parameters to control mapping of EST?s. I was hoping to get similar number of genes mapped on to new assembly with very few changes. Thank you for your help in advance. Prashant Prashant Hosmani Sol Genomics Network Boyce Thompson Institute, Ithaca, NY, USA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sullis02 at nyu.edu Thu Sep 22 10:27:53 2016 From: sullis02 at nyu.edu (Steven Sullivan) Date: Thu, 22 Sep 2016 12:27:53 -0400 Subject: [maker-devel] should EST evidence be cleaned, assembled? Message-ID: Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? -- Dr. Steven Sullivan Center for Genomics & Systems Biology New York University 12 Waverly Place New York, NY 10003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 26 09:28:41 2016 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 26 Sep 2016 09:28:41 -0600 Subject: [maker-devel] should EST evidence be cleaned, assembled? In-Reply-To: References: Message-ID: You will want to trim the vector or any sequence not representative of the transcript or else it will not align well. The sequences will be aligned directly against the assembly. ?Carson > On Sep 22, 2016, at 10:27 AM, Steven Sullivan wrote: > > Do EST sequences (as opposed to RNA Seq data) need to be cleaned (e.g., vector sequence trimmed, Ns removed) and assembled (combined into longer 'EST contigs' where possible) before use as MAKER alignment evidence? > > > -- > Dr. Steven Sullivan > Center for Genomics & Systems Biology > New York University > 12 Waverly Place > New York, NY 10003 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org