From maker-devel at yandell-lab.org Tue May 3 09:16:56 2016 From: maker-devel at yandell-lab.org (CamScanner) Date: Tue, 03 May 2016 19:46:56 +0530 Subject: [maker-devel] New Doc 199 Page 8 Message-ID: <100836F7888FB647A8B3B49B7D2F0D256047122DF01B8AFC819D436C@yandell-lab.org> Scanned by CamScanner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: New Doc 101_0.zip Type: application/zip Size: 5377 bytes Desc: not available URL: From munholl at uwindsor.ca Tue May 10 13:18:09 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 10 May 2016 14:18:09 -0400 Subject: [maker-devel] MAKER seg faulting Message-ID: Hello Everyone, For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 10 18:02:30 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 May 2016 17:02:30 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: Message-ID: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) ?Carson > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 11 10:35:05 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 11 May 2016 11:35:05 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t > give developers that kind of low level access to memory). However if you > are using MPI, then it could be causing a seg fault, or one of the programs > MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make > suggestions (for example MVAPICH2 is incompatible with programs that do > system calls, and OpenMPI may require special setting for LD_PRELOAD to > work properly with shared libraries). If your not using MPI, then you will > need to look at the installed programs MAKER is calling and reinstall them, > update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg > faulting. I've changed the the dataset I'm running MAKER on, by parsing > out smaller sections of the larger assembly, and I still seg fault on > sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have > jsut finished a tblastx. Any suggestions for how I can debug and correct > this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Thu May 12 16:49:51 2016 From: platycerus at gmail.com (Ray Cui) Date: Thu, 12 May 2016 23:49:51 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Dear Yugui I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. Ray On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from > /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 19:31:55 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:31:55 +1000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] wrote: > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? CEGMA has > been discontinued but I used to use the cegma2zff.pl steps to use those > proteins as a training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly utilized. I > didn't see anything specific about this in the archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its attachments is > confidential and may contain sensitive information. It should not be used > by anyone who is not the original intended recipient. If you have received > this e-mail in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of Allergy and > Infectious Diseases shall not accept liability for any statements made that > are sender's own and not expressly made on behalf of the NIAID by one of > its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 19:37:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:37:03 +1000 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: <1460516670248.1644@uq.edu.au> References: <1460516670248.1644@uq.edu.au> Message-ID: Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too On 13 April 2016 at 13:05, Jenny Lee wrote: > Hi all, > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which > have interproscan hits). I would like to also exclude the repeats > information and only retain the CDS, gene, exon and mRNA - like the > format we usually see in published data. > > > I've been trying to do this manually and it gets messy. Any ideas? > > > Thanks a lot. > > > Regards, > > Jenny Lee > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri May 13 02:56:58 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 13 May 2016 09:56:58 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos wrote: > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: > >> Hello, >> >> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >> been discontinued but I used to use the cegma2zff.pl steps to use those >> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >> what file to pass from BUSCO to MAKER for this to be properly utilized. I >> didn't see anything specific about this in the archives. >> ----- >> >> *Ian Misner, Ph.D.* >> >> Computational Genomics Specialist >> >> Contractor, Medical Science and Computing, Inc. >> >> Bioinformatics and Computational Biosciences Branch (BCBB) >> >> NIH/NIAID/OD/OSMO/OCICB >> >> 5601 Fishers Lane, Room 4A59 >> >> Rockville, MD 20892 >> >> Office: 301-761-6208 >> >> Mobile: 301-704-0151 >> >> Email: ian.misner at nih.gov >> >> Web: BCBB Home Page >> >> Twitter: @NIAIDBioIT >> >> >> >> Disclaimer: The information in this e-mail and any of its attachments is >> confidential and may contain sensitive information. It should not be used >> by anyone who is not the original intended recipient. If you have received >> this e-mail in error please inform the sender and delete it from your >> mailbox or any other storage devices. National Institute of Allergy and >> Infectious Diseases shall not accept liability for any statements made that >> are sender's own and not expressly made on behalf of the NIAID by one of >> its representatives. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri May 13 05:14:57 2016 From: fdolze at students.uni-mainz.de (Dolze, Florian) Date: Fri, 13 May 2016 12:14:57 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> On a somewhat related note, is there an advantage of using BUSCO to train Augustus instead of the provided Augustus webtraining service? Does anybody know how those 2 compare? Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and > then, in the maker_opts.ctl file, you should point the > augustus_species variable to your trained species (i.e. the name you > pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer > BUSCO versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > > wrote: > > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > > wrote: > > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? > CEGMA has been discontinued but I used to use the cegma2zff.pl > steps to use those proteins as a > training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly > utilized. I didn't see anything specific about this in the > archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its > attachments is confidential and may contain sensitive > information. It should not be used by anyone who is not the > original intended recipient. If you have received this e-mail > in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of > Allergy and Infectious Diseases shall not accept liability for > any statements made that are sender's own and not expressly > made on behalf of the NIAID by one of its representatives. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, /PhD/ > /Research Associate/ > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 03:28:00 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 10:28:00 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis wrote: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 07:54:40 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 14:54:40 +0200 Subject: [maker-devel] BUSCO In-Reply-To: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> References: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> Message-ID: I would guess that the main 'advantage' of using BUSCO to train Augustus is that one will probably run BUSCO on one's genome anyway before starting MAKER, so there will already be a useful set of trained parameters ready to use. I guess the 'advantage' of using the Augustus webtraining service is that one could give it much more starting data (if indeed this is available, e.g. cDNAs). Indeed if there was enough time and it made a substantial difference one might even use the BUSCO gene model output as the 'Training gene structure file' for Augustus webtraining service. I don't believe that anyone has done a comparison on how different the trained parameters end up being. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 12:14, Dolze, Florian wrote: > > On a somewhat related note, is there an advantage of using BUSCO to train > Augustus instead of the provided Augustus webtraining service? Does anybody > know how those 2 compare? > > > > Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing listmaker-devel at yandell-lab.orghttp://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 13 11:25:56 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 May 2016 16:25:56 +0000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Our group have mainly used the BUSCO model in the ?bootstrap? run for MAKER, then retrain Augustus and SNAP using a filtered data set from that run for new rounds of MAKER. Also, one personal observation: we have found some genome assemblies where BUSCO performs poorly compared to CEGMA (e.g. BUSCO reports poor overall percent of SCO present, while CEGMA reports much higher numbers). We?re still delving into this, but in those cases we avoid using the BUSCO model for obvious reasons. chris On May 13, 2016, at 3:28 AM, Robert Waterhouse > wrote: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis > wrote: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > wrote: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: Hello, Are there any guidelines for using BUSCO to help train MAKER? CEGMA has been discontinued but I used to use the cegma2zff.pl steps to use those proteins as a training step. BUSCO seems to train Augustus but I'm not sure what file to pass from BUSCO to MAKER for this to be properly utilized. I didn't see anything specific about this in the archives. ----- Ian Misner, Ph.D. Computational Genomics Specialist Contractor, Medical Science and Computing, Inc. Bioinformatics and Computational Biosciences Branch (BCBB) NIH/NIAID/OD/OSMO/OCICB 5601 Fishers Lane, Room 4A59 Rockville, MD 20892 Office: 301-761-6208 Mobile: 301-704-0151 Email: ian.misner at nih.gov Web: BCBB Home Page Twitter: @NIAIDBioIT Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 13 10:34:00 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 13 May 2016 11:34:00 -0400 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: References: <1460516670248.1644@uq.edu.au> Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A@gmail.com> I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. Mike > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos wrote: > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > On 13 April 2016 at 13:05, Jenny Lee > wrote: > Hi all, > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > I've been trying to do this manually and it gets messy. Any ideas? > > Thanks a lot. > > Regards, > Jenny Lee > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bi0411 (1).pdf Type: application/pdf Size: 484328 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 11:32:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:32:40 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). ?Carson > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. > > Ray > > On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.p.price at gmail.com Fri May 13 11:35:21 2016 From: dave.p.price at gmail.com (David Price) Date: Fri, 13 May 2016 10:35:21 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: would it be possible to get digest mode set up properly? I have it selected but I get emails for each individual message. Thanks On Fri, May 13, 2016 at 10:27 AM, wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > To: Xabier V?zquez Campos > Cc: Jenny Lee , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com> > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to > do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need > to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee h.lee12 at uq.edu.au>> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which have > interproscan hits). I would like to also exclude the repeats information > and only retain the CDS, gene, exon and mRNA - like the format we usually > see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.pdf > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 11:46:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:46:38 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: <9ABEE8DB-6316-4CF1-BC46-0DB2C188BC44@gmail.com> I toggled off and back on your digest option incase that is the issue. The Mailman docs say that on busy days the digest option may decide to send out more than one digest, so that could be the issue too. The company providing out mail list was having issues the last few weeks, so we weren?t able to approve most posts until yesterday. As a result, there was an explosion of approved posts that may have triggered the digest to be more than 1 per day yesterday and today. ?Carson > On May 13, 2016, at 10:35 AM, David Price wrote: > > would it be possible to get digest mode set up properly? > I have it selected but I get emails for each individual message. > > Thanks > > On Fri, May 13, 2016 at 10:27 AM, > wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > > To: Xabier V?zquez Campos > > Cc: Jenny Lee >, "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com > > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee >> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Fri May 13 12:08:19 2016 From: platycerus at gmail.com (Ray Cui) Date: Fri, 13 May 2016 19:08:19 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Hello, I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). It works fine with MpiCH so I stopped looking into this. Ray On Fri, May 13, 2016 at 6:32 PM, Carson Holt wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the > ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on > CentOS systems. To get around that issue, compile OpenMPI with gcc instead > of the intel compiler, or alternatively manually install a separate perl > installation without pthread support (i.e. pthreads disabled during the > configure step). > > ?Carson > > > > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with > Maker. I now use mpich, which works. > > Ray > On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from >> /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from >> /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 12:16:49 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 11:16:49 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: It?s possible it was set wrong as there may be more than one libmpi.so on the system. It also has to be set before compiling and every time you run. The next issue is that some systems (like ubuntu) will often have extra mpicc, libmpi.so, and mpiexec files that don?t match the OpenMPI you are trying to use. Tracking down those mismatches before compiling and ensuring that they don?t revert with your bashrc/bash_profile can be complicated. In these cases you may also have to additionally specify LD_PRELOAD with the -x parameter of the OpenMPI mpiexec command. You often have to specify the ?-mca btl? parameter explained in the INSTALL file as well. ?Carson > On May 13, 2016, at 11:08 AM, Ray Cui wrote: > > Hello, > > I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). > It works fine with MpiCH so I stopped looking into this. > > Ray > > On Fri, May 13, 2016 at 6:32 PM, Carson Holt > wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). > > ?Carson > > > >> On May 12, 2016, at 3:49 PM, Ray Cui > wrote: >> >> Dear Yugui >> >> I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. >> >> Ray >> >> On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Sun May 15 19:57:37 2016 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 16 May 2016 00:57:37 +0000 Subject: [maker-devel] Fwd: Issue with make, no prediction after gff3_merge References: Message-ID: <2B3935BF-1995-4250-8694-92FA0C36A729@genetics.utah.edu> Hi David, First of all apologies for the delay in addressing your e-mail, our mailing list software (provided by an external ISP) has stopped supporting the MailMan software that is running the maker-devel list and the software has been unresponsive to our attempts to add new users or moderate messages. We will handle this message directly through e-mail for now. We have requested a new maker mailing list through our University IT department and that is request pending approval. The new mailing list should get our experience should get our user support back to normal very soon. Can you share a few lines of the GFF files that you passed to est_gff? Thanks Barry Begin forwarded message: From: "LOPEZ, David" > Subject: Issue with make, no prediction after gff3_merge Date: May 3, 2016 at 1:27:03 AM PDT To: "maker-devel-owner at yandell-lab.org" > Dear all, I am still waiting for my registration at maker-devell list hence I send my question as a mail but I will transfer it to the discussion group when possible. I am a commercial licenced user of Maker and I currently currently face some issues running Maker3.00.0 on a PBS cluster with an openMPI 1.10.2 implementation (Which runs great most of the time, but that is not the issue discussed here). After successfully testing the datatset provided in the package (dpp and pyu) I moved to my own assembly (140 000 scaffolds ~ 14GB, eukaryotic, premasked) I have already made some rnaseq mappings (gff) as well as CDNA and Proteome from reference genome. To me it appears that only fasta evidences are used but not the gff when I look at the predictions in IGV. In the gff from gff3_merge, I have blastx protein2genome and maker predictions as well as est_gff:stringtie but no est_gff:somethingelse from my CDNA and EST fed to maker. Another issue, potentially linked to this problem is that I wasn?t able to use tags in my gff evidences: maker fails to run telling that: mygff3evidencefile.gff:mylabel was not found which means it doesn?t interpret right the semicolum. I have attached my maker opts files. Thanks by advance for your help Best regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.ctl Type: application/octet-stream Size: 911 bytes Desc: maker_evm.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1601 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5236 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1508 bytes Desc: maker_bopts.ctl URL: From clements at galaxyproject.org Mon May 16 12:20:35 2016 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 16 May 2016 10:20:35 -0700 Subject: [maker-devel] GMOD 2016 Meeting early reg ends May 21; Galaxy Conference Deadlines Message-ID: Hello all, *GMOD will be holding a community meeting on June 30th and July 1st in Bloomington, Indiana, United States.* GMOD Meetings are a mix of user and developer presentations, and are a great place to find out what is happening in the project, what's coming up, and what others are doing. *Early bird registration ends May 21, this Saturday.* *For those who would like to present a talk or poster, the meeting registration form includes a section for submitting the presentation title and abstract.* If you have any suggestions or requests for the meeting, please contact the GMOD help desk . *GCC2016* The GMOD Meeting is immediately after the 2016 Galaxy Community Conference (GCC2016) , also in Bloomington (and sharing housing and venue). If you are interested in Galaxy, *GCC2016 has a number of deadlines this Friday, May 20*. See below. Galaxy is a part of the GMOD project and there are several presentations at GCC2016 that cover the GMOD integration: - Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform, talk presented by Margaret Staton - Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects , talk presented by Nathan Dunn (Apollo will have a poster and demo) - Hardwood Genomics Database (HGD): a web portal and database resource for hardwood tree genomic and genetic research, poster presented by Ming Chen and Margaret Staton (posters are not online yet) More posters and demos are in the works. Thanks, and hope to see you in Bloomington, Dave C ---------- Forwarded message ---------- From: Dave Clements Date: Mon, May 16, 2016 at 9:09 AM Subject: GCC2016 Deadlines this Friday & Conference schedule To: Galaxy Announcements List , Galaxy Dev List Hello all, This is just a reminder that* there are some key deadlines this Friday, May 20:* - Early registration ends . After Friday registration rates go up by over 40%. - Poster abstracts are due. - Demo abstracts are due. These are new this year and can complement a poster abstract or stand on their own. If you are wondering what's happening at GCC2016, the training and conference schedules are now online, featuring 21 accepted talks and 31 training sessions . And, thanks to Jetstream IU's newest National Science Foundation-funded project (and in which Galaxy is a partner), and the National Center for Genome Analysis Support at IU are sponsoring an opening reception on Monday evening at the IU Cyberinfrastructure Building. The first ever GCC opening reception will feature local wine/beer, morsels from local eateries, and demonstrations of the 15 million+ pixel IQ-Wall, IU's Data Center, Science on a Sphere, and other IU-centric IT. Hope to see you there, Dave C -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Mon May 16 20:30:23 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 17 May 2016 08:30:23 +0700 Subject: [maker-devel] Your .pdf document is attached Message-ID: <201605171983886.0A606AA3@m6888933.yandell-lab.org> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 21010F.zip Type: application/x-zip-compressed Size: 2960 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Wed May 18 13:33:39 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 18 May 2016 18:33:39 +0000 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: <573DB6D8-E773-45F5-9F53-DCAA20913EFF@genetics.utah.edu> Yes. Use top to check cpu usage. If it?s not 100% for the machine (or 6400% for all processes - 64 cpus * 100%), then we can look if you are launching the command correctly or have other issues. ?Carson On May 18, 2016, at 7:16 AM, Michael Campbell > wrote: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike On May 17, 2016, at 10:10 PM, Pei-Ying Huang > wrote: Hi mike, My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. Then autotrain with augustus, here is my command autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log But I saw one's method below, so I wonder if I am doing wrong? "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 1. EST data 2. Proteins from related species 3. a SNAP model trained using CEGMA 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 5. Running maker2zff on the output of MAKER, and converting that to GFF3 Once done, we run MAKER a second time using the Augustus model and more stringent settings." Thank you. Pei-Ying 2016-05-18 9:16 GMT+08:00 Michael Campbell >: Hi Pei-Ying, One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. I would only redo the sequencing if you are convinced that the original sequencing is bad. Mike On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: Hi mike, As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? Thank you. Pei-Ying 2016-05-09 22:18 GMT+08:00 Michael Campbell >: I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. Thanks, Mike On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: Have you done all of the test? What would you suggest me to run my data? To get ab initio model by setting the est2genome =1 and protein2genome = 0, then training with sanp model with est2genome = 0 and protein2genome = 0, training second snap model with est2genome = 0 and protein2genome = 0. Thank you. 2016-05-07 0:30 GMT+08:00 Michael Campbell >: So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. Mike On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: Hi Mike, I found one five_prime_UTP evidence, but only this one shown in the scaff0001. Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? Thank you. GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 Pei-Ying 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: Hi Mike, Any clue about the problems? Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. Thank you. Pei-Ying 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: Hi Mike, Attached file is the folder I use to run maker. Thank you. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] guliRN_L1_v1_mike.tar.gz[X] ? Pei-Ying 2016-05-04 22:54 GMT+08:00 Michael Campbell >: Hi Pei-Ying, If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. your maker_opts.ctl file looks fine. If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. Thanks, Mike On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: Hi Mike, basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker1.log[X] ?? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker_opts.log[X] ? less A_guli_1.all.gff GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 Thank you. Pei-Ying 2016-04-14 21:09 GMT+08:00 Michael Campbell >: It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. The attached protocol paper also addresses your quality question to an extent. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed May 18 08:16:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 18 May 2016 09:16:05 -0400 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike > On May 17, 2016, at 10:10 PM, Pei-Ying Huang wrote: > > Hi mike, > > My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. > Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. > Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. > > Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. > Then autotrain with augustus, here is my command > autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log > > > But I saw one's method below, so I wonder if I am doing wrong? > > "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: > 1. EST data > 2. Proteins from related species > 3. a SNAP model trained using CEGMA > 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) > 5. Running maker2zff on the output of MAKER, and converting that to GFF3 > Once done, we run MAKER a second time using the Augustus model and more stringent settings." > > Thank you. > Pei-Ying > > > > > > 2016-05-18 9:16 GMT+08:00 Michael Campbell >: > Hi Pei-Ying, > > One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. > > I would only redo the sequencing if you are convinced that the original sequencing is bad. > > Mike > > >> On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: >> >> Hi mike, >> >> As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? >> If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? >> Thank you. >> >> Pei-Ying >> >> 2016-05-09 22:18 GMT+08:00 Michael Campbell >: >> I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. >> >> Thanks, >> Mike >>> On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: >>> >>> Have you done all of the test? >>> What would you suggest me to run my data? >>> >>> To get ab initio model by setting the est2genome =1 and protein2genome = 0, >>> then training with sanp model with est2genome = 0 and protein2genome = 0, >>> training second snap model with est2genome = 0 and protein2genome = 0. >>> >>> Thank you. >>> >>> 2016-05-07 0:30 GMT+08:00 Michael Campbell >: >>> So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. >>> Mike >>>> On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: >>>> >>>> Hi Mike, >>>> >>>> I found one five_prime_UTP evidence, but only this one shown in the scaff0001. >>>> Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? >>>> Thank you. >>>> >>>> GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 >>>> GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 >>>> GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> >>>> Pei-Ying >>>> >>>> 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Any clue about the problems? >>>> Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. >>>> Thank you. >>>> >>>> Pei-Ying >>>> >>>> >>>> 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Attached file is the folder I use to run maker. Thank you. >>>> ? >>>> ?guliRN_L1_v1_mike.tar.gz ? >>>> Pei-Ying >>>> >>>> 2016-05-04 22:54 GMT+08:00 Michael Campbell >: >>>> Hi Pei-Ying, >>>> >>>> If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. >>>> >>>> your maker_opts.ctl file looks fine. >>>> >>>> If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. >>>> >>>> Thanks, >>>> Mike >>>>> On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: >>>>> >>>>> Hi Mike, >>>>> >>>>> basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. >>>>> I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. >>>>> Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. >>>>> >>>>> Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? >>>>> When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. >>>>> >>>>> /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff >>>>> >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. >>>>> ? >>>>> ?maker1.log ?? >>>>> ?maker_opts.log ? >>>>> less A_guli_1.all.gff >>>>> GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 >>>>> GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 >>>>> GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> >>>>> Thank you. >>>>> Pei-Ying >>>>> >>>>> >>>>> >>>>> >>>>> 2016-04-14 21:09 GMT+08:00 Michael Campbell >: >>>>> It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. >>>>> >>>>> It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. >>>>> >>>>> Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. >>>>> >>>>> The attached protocol paper also addresses your quality question to an extent. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 12:08:52 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:08:52 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: References: Message-ID: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. ?Carson > On May 24, 2016, at 10:58 AM, Won C Yim wrote: > > Dear MAKER team, > > We have been using MAKER to generate our plant genome annotations. > > Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. > > Is there any way to discard single exon genes? > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Tue May 24 12:11:55 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 24 May 2016 13:11:55 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, Just an update, that was indeed my issue. Thanks for your help! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Wed, May 11, 2016 at 11:35 AM, Seth Munholland wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ > install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > >> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t >> give developers that kind of low level access to memory). However if you >> are using MPI, then it could be causing a seg fault, or one of the programs >> MAKER is calling could be seg faulting (like BLAST). >> >> So if you are using MPI, let me know which flavor and I can make >> suggestions (for example MVAPICH2 is incompatible with programs that do >> system calls, and OpenMPI may require special setting for LD_PRELOAD to >> work properly with shared libraries). If your not using MPI, then you will >> need to look at the installed programs MAKER is calling and reinstall them, >> update them, or roll back a version (i.e. BLAST, Exonerate, etc.) >> >> ?Carson >> >> >> >> On May 10, 2016, at 12:18 PM, Seth Munholland >> wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg >> faulting. I've changed the the dataset I'm running MAKER on, by parsing >> out smaller sections of the larger assembly, and I still seg fault on >> sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have >> jsut finished a tblastx. Any suggestions for how I can debug and correct >> this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 12:12:42 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:12:42 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Great to know it?s working for you now. ?Carson > On May 24, 2016, at 11:11 AM, Seth Munholland wrote: > > Hi Carson, > > Just an update, that was indeed my issue. Thanks for your help! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Wed, May 11, 2016 at 11:35 AM, Seth Munholland > wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Tue, May 10, 2016 at 7:02 PM, Carson Holt > wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > >> On May 10, 2016, at 12:18 PM, Seth Munholland > wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <>_______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 12:14:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:14:56 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> References: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Message-ID: <5DEE67B4-F022-479E-A5C5-97F76FD6601D@gmail.com> As a side note. Many of the newer plant genomes I?ve worked on have had entire yeast and bacterial genome sequenced into their assemblies (as their own separate contigs even). It is a common issue that is easily identified by just looking at a few of the more gene dense contigs in a browser like apollo. ?Carson > On May 24, 2016, at 11:08 AM, Carson Holt wrote: > > Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). > > What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. > > ?Carson > > > >> On May 24, 2016, at 10:58 AM, Won C Yim > wrote: >> >> Dear MAKER team, >> >> We have been using MAKER to generate our plant genome annotations. >> >> Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. >> >> Is there any way to discard single exon genes? >> >> Regards, >> >> Won >> >> -- >> Yim, Won Cheol >> MS330/Department of Biochemistry & Molecular Biology >> 1664 N. Virginia Street >> University of Nevada, Reno >> >> email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wyim at unr.edu Tue May 24 11:58:41 2016 From: wyim at unr.edu (Won C Yim) Date: Tue, 24 May 2016 16:58:41 +0000 Subject: [maker-devel] Single exon in GFF file. Message-ID: Dear MAKER team, We have been using MAKER to generate our plant genome annotations. Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. Is there any way to discard single exon genes? Regards, Won -- Yim, Won Cheol MS330/Department of Biochemistry & Molecular Biology 1664 N. Virginia Street University of Nevada, Reno email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From debarryj at gmail.com Thu May 26 14:54:37 2016 From: debarryj at gmail.com (Jeremy DeBarry) Date: Thu, 26 May 2016 12:54:37 -0700 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon Message-ID: Greetings, My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. Do you have any insight? Much appreciated, Jeremy -- Dr. Jeremy DeBarry PhD MaHPIC Data Coordinator Kissinger Research Group The University of Georgia ::: Email: debarryj at gmail.com Tel: +1.912.269.0484 Skype ID: jdebarry ::: Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 27 09:15:47 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 May 2016 08:15:47 -0600 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon In-Reply-To: References: Message-ID: Make sure your using the current version of MAKER. That same thread mentions it was fixed once they updated from 2.31.3 Current version is 2.31.8. ?Carson > On May 26, 2016, at 1:54 PM, Jeremy DeBarry wrote: > > Greetings, > My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. > > I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. > > My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. > > Do you have any insight? > > Much appreciated, > Jeremy > > -- > Dr. Jeremy DeBarry PhD > MaHPIC Data Coordinator > Kissinger Research Group > > The University of Georgia > ::: > Email: debarryj at gmail.com > Tel: +1.912.269.0484 > Skype ID: jdebarry > ::: > Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From philipp.bayer at uwa.edu.au Tue May 31 01:37:46 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Tue, 31 May 2016 14:37:46 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations Message-ID: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Hello, I have a minor question about the way MAKER joins annotations from different chunks when using MPI. Let's say I have a longer gene that bridges two chunks, so the jobs annotating both chunks separately would return two incomplete genes, one without a stop codon, one without a start codon. I assume MAKER would then join those two into a single gene, right? Is this behaviour influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? Thank you Philipp Bayer From carsonhh at gmail.com Tue May 31 10:51:34 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 May 2016 09:51:34 -0600 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Message-ID: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. ?Carson > On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: > > Hello, > > I have a minor question about the way MAKER joins annotations from > different chunks when using MPI. > > Let's say I have a longer gene that bridges two chunks, so the jobs > annotating both chunks separately would return two incomplete genes, one > without a stop codon, one without a start codon. I assume MAKER would > then join those two into a single gene, right? Is this behaviour > influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? > > Thank you > > Philipp Bayer > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From philipp.bayer at uwa.edu.au Tue May 31 20:57:10 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Wed, 1 Jun 2016 09:57:10 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Message-ID: Hello, thank you very much for your detailed answer! Looks like I had misinterpreted some details of the program, this is very helpful, thank you! Cheers Philipp On 31.05.2016 23:51, Carson Holt wrote: > Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). > > The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. > > The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). > > The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. > > ?Carson > >> On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: >> >> Hello, >> >> I have a minor question about the way MAKER joins annotations from >> different chunks when using MPI. >> >> Let's say I have a longer gene that bridges two chunks, so the jobs >> annotating both chunks separately would return two incomplete genes, one >> without a stop codon, one without a start codon. I assume MAKER would >> then join those two into a single gene, right? Is this behaviour >> influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? >> >> Thank you >> >> Philipp Bayer >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From maker-devel at yandell-lab.org Tue May 3 08:16:56 2016 From: maker-devel at yandell-lab.org (CamScanner) Date: Tue, 03 May 2016 19:46:56 +0530 Subject: [maker-devel] New Doc 199 Page 8 Message-ID: <100836F7888FB647A8B3B49B7D2F0D256047122DF01B8AFC819D436C@yandell-lab.org> Scanned by CamScanner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: New Doc 101_0.zip Type: application/zip Size: 5377 bytes Desc: not available URL: From munholl at uwindsor.ca Tue May 10 12:18:09 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 10 May 2016 14:18:09 -0400 Subject: [maker-devel] MAKER seg faulting Message-ID: Hello Everyone, For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 10 17:02:30 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 May 2016 17:02:30 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: Message-ID: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) ?Carson > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 11 09:35:05 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 11 May 2016 11:35:05 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t > give developers that kind of low level access to memory). However if you > are using MPI, then it could be causing a seg fault, or one of the programs > MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make > suggestions (for example MVAPICH2 is incompatible with programs that do > system calls, and OpenMPI may require special setting for LD_PRELOAD to > work properly with shared libraries). If your not using MPI, then you will > need to look at the installed programs MAKER is calling and reinstall them, > update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg > faulting. I've changed the the dataset I'm running MAKER on, by parsing > out smaller sections of the larger assembly, and I still seg fault on > sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have > jsut finished a tblastx. Any suggestions for how I can debug and correct > this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Thu May 12 15:49:51 2016 From: platycerus at gmail.com (Ray Cui) Date: Thu, 12 May 2016 23:49:51 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Dear Yugui I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. Ray On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from > /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:31:55 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:31:55 +1000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] wrote: > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? CEGMA has > been discontinued but I used to use the cegma2zff.pl steps to use those > proteins as a training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly utilized. I > didn't see anything specific about this in the archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its attachments is > confidential and may contain sensitive information. It should not be used > by anyone who is not the original intended recipient. If you have received > this e-mail in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of Allergy and > Infectious Diseases shall not accept liability for any statements made that > are sender's own and not expressly made on behalf of the NIAID by one of > its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:37:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:37:03 +1000 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: <1460516670248.1644@uq.edu.au> References: <1460516670248.1644@uq.edu.au> Message-ID: Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too On 13 April 2016 at 13:05, Jenny Lee wrote: > Hi all, > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which > have interproscan hits). I would like to also exclude the repeats > information and only retain the CDS, gene, exon and mRNA - like the > format we usually see in published data. > > > I've been trying to do this manually and it gets messy. Any ideas? > > > Thanks a lot. > > > Regards, > > Jenny Lee > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri May 13 01:56:58 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 13 May 2016 09:56:58 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos wrote: > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: > >> Hello, >> >> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >> been discontinued but I used to use the cegma2zff.pl steps to use those >> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >> what file to pass from BUSCO to MAKER for this to be properly utilized. I >> didn't see anything specific about this in the archives. >> ----- >> >> *Ian Misner, Ph.D.* >> >> Computational Genomics Specialist >> >> Contractor, Medical Science and Computing, Inc. >> >> Bioinformatics and Computational Biosciences Branch (BCBB) >> >> NIH/NIAID/OD/OSMO/OCICB >> >> 5601 Fishers Lane, Room 4A59 >> >> Rockville, MD 20892 >> >> Office: 301-761-6208 >> >> Mobile: 301-704-0151 >> >> Email: ian.misner at nih.gov >> >> Web: BCBB Home Page >> >> Twitter: @NIAIDBioIT >> >> >> >> Disclaimer: The information in this e-mail and any of its attachments is >> confidential and may contain sensitive information. It should not be used >> by anyone who is not the original intended recipient. If you have received >> this e-mail in error please inform the sender and delete it from your >> mailbox or any other storage devices. National Institute of Allergy and >> Infectious Diseases shall not accept liability for any statements made that >> are sender's own and not expressly made on behalf of the NIAID by one of >> its representatives. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri May 13 04:14:57 2016 From: fdolze at students.uni-mainz.de (Dolze, Florian) Date: Fri, 13 May 2016 12:14:57 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> On a somewhat related note, is there an advantage of using BUSCO to train Augustus instead of the provided Augustus webtraining service? Does anybody know how those 2 compare? Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and > then, in the maker_opts.ctl file, you should point the > augustus_species variable to your trained species (i.e. the name you > pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer > BUSCO versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > > wrote: > > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > > wrote: > > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? > CEGMA has been discontinued but I used to use the cegma2zff.pl > steps to use those proteins as a > training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly > utilized. I didn't see anything specific about this in the > archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its > attachments is confidential and may contain sensitive > information. It should not be used by anyone who is not the > original intended recipient. If you have received this e-mail > in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of > Allergy and Infectious Diseases shall not accept liability for > any statements made that are sender's own and not expressly > made on behalf of the NIAID by one of its representatives. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, /PhD/ > /Research Associate/ > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 02:28:00 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 10:28:00 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis wrote: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 06:54:40 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 14:54:40 +0200 Subject: [maker-devel] BUSCO In-Reply-To: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> References: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> Message-ID: I would guess that the main 'advantage' of using BUSCO to train Augustus is that one will probably run BUSCO on one's genome anyway before starting MAKER, so there will already be a useful set of trained parameters ready to use. I guess the 'advantage' of using the Augustus webtraining service is that one could give it much more starting data (if indeed this is available, e.g. cDNAs). Indeed if there was enough time and it made a substantial difference one might even use the BUSCO gene model output as the 'Training gene structure file' for Augustus webtraining service. I don't believe that anyone has done a comparison on how different the trained parameters end up being. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 12:14, Dolze, Florian wrote: > > On a somewhat related note, is there an advantage of using BUSCO to train > Augustus instead of the provided Augustus webtraining service? Does anybody > know how those 2 compare? > > > > Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing listmaker-devel at yandell-lab.orghttp://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 13 10:25:56 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 May 2016 16:25:56 +0000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Our group have mainly used the BUSCO model in the ?bootstrap? run for MAKER, then retrain Augustus and SNAP using a filtered data set from that run for new rounds of MAKER. Also, one personal observation: we have found some genome assemblies where BUSCO performs poorly compared to CEGMA (e.g. BUSCO reports poor overall percent of SCO present, while CEGMA reports much higher numbers). We?re still delving into this, but in those cases we avoid using the BUSCO model for obvious reasons. chris On May 13, 2016, at 3:28 AM, Robert Waterhouse > wrote: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis > wrote: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > wrote: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: Hello, Are there any guidelines for using BUSCO to help train MAKER? CEGMA has been discontinued but I used to use the cegma2zff.pl steps to use those proteins as a training step. BUSCO seems to train Augustus but I'm not sure what file to pass from BUSCO to MAKER for this to be properly utilized. I didn't see anything specific about this in the archives. ----- Ian Misner, Ph.D. Computational Genomics Specialist Contractor, Medical Science and Computing, Inc. Bioinformatics and Computational Biosciences Branch (BCBB) NIH/NIAID/OD/OSMO/OCICB 5601 Fishers Lane, Room 4A59 Rockville, MD 20892 Office: 301-761-6208 Mobile: 301-704-0151 Email: ian.misner at nih.gov Web: BCBB Home Page Twitter: @NIAIDBioIT Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 13 09:34:00 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 13 May 2016 11:34:00 -0400 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: References: <1460516670248.1644@uq.edu.au> Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A@gmail.com> I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. Mike > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos wrote: > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > On 13 April 2016 at 13:05, Jenny Lee > wrote: > Hi all, > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > I've been trying to do this manually and it gets messy. Any ideas? > > Thanks a lot. > > Regards, > Jenny Lee > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bi0411 (1).pdf Type: application/pdf Size: 484328 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:32:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:32:40 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). ?Carson > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. > > Ray > > On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.p.price at gmail.com Fri May 13 10:35:21 2016 From: dave.p.price at gmail.com (David Price) Date: Fri, 13 May 2016 10:35:21 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: would it be possible to get digest mode set up properly? I have it selected but I get emails for each individual message. Thanks On Fri, May 13, 2016 at 10:27 AM, wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > To: Xabier V?zquez Campos > Cc: Jenny Lee , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com> > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to > do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need > to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee h.lee12 at uq.edu.au>> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which have > interproscan hits). I would like to also exclude the repeats information > and only retain the CDS, gene, exon and mRNA - like the format we usually > see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.pdf > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:46:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:46:38 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: <9ABEE8DB-6316-4CF1-BC46-0DB2C188BC44@gmail.com> I toggled off and back on your digest option incase that is the issue. The Mailman docs say that on busy days the digest option may decide to send out more than one digest, so that could be the issue too. The company providing out mail list was having issues the last few weeks, so we weren?t able to approve most posts until yesterday. As a result, there was an explosion of approved posts that may have triggered the digest to be more than 1 per day yesterday and today. ?Carson > On May 13, 2016, at 10:35 AM, David Price wrote: > > would it be possible to get digest mode set up properly? > I have it selected but I get emails for each individual message. > > Thanks > > On Fri, May 13, 2016 at 10:27 AM, > wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > > To: Xabier V?zquez Campos > > Cc: Jenny Lee >, "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com > > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee >> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Fri May 13 11:08:19 2016 From: platycerus at gmail.com (Ray Cui) Date: Fri, 13 May 2016 19:08:19 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Hello, I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). It works fine with MpiCH so I stopped looking into this. Ray On Fri, May 13, 2016 at 6:32 PM, Carson Holt wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the > ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on > CentOS systems. To get around that issue, compile OpenMPI with gcc instead > of the intel compiler, or alternatively manually install a separate perl > installation without pthread support (i.e. pthreads disabled during the > configure step). > > ?Carson > > > > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with > Maker. I now use mpich, which works. > > Ray > On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from >> /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from >> /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 11:16:49 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 11:16:49 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: It?s possible it was set wrong as there may be more than one libmpi.so on the system. It also has to be set before compiling and every time you run. The next issue is that some systems (like ubuntu) will often have extra mpicc, libmpi.so, and mpiexec files that don?t match the OpenMPI you are trying to use. Tracking down those mismatches before compiling and ensuring that they don?t revert with your bashrc/bash_profile can be complicated. In these cases you may also have to additionally specify LD_PRELOAD with the -x parameter of the OpenMPI mpiexec command. You often have to specify the ?-mca btl? parameter explained in the INSTALL file as well. ?Carson > On May 13, 2016, at 11:08 AM, Ray Cui wrote: > > Hello, > > I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). > It works fine with MpiCH so I stopped looking into this. > > Ray > > On Fri, May 13, 2016 at 6:32 PM, Carson Holt > wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). > > ?Carson > > > >> On May 12, 2016, at 3:49 PM, Ray Cui > wrote: >> >> Dear Yugui >> >> I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. >> >> Ray >> >> On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Sun May 15 18:57:37 2016 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 16 May 2016 00:57:37 +0000 Subject: [maker-devel] Fwd: Issue with make, no prediction after gff3_merge References: Message-ID: <2B3935BF-1995-4250-8694-92FA0C36A729@genetics.utah.edu> Hi David, First of all apologies for the delay in addressing your e-mail, our mailing list software (provided by an external ISP) has stopped supporting the MailMan software that is running the maker-devel list and the software has been unresponsive to our attempts to add new users or moderate messages. We will handle this message directly through e-mail for now. We have requested a new maker mailing list through our University IT department and that is request pending approval. The new mailing list should get our experience should get our user support back to normal very soon. Can you share a few lines of the GFF files that you passed to est_gff? Thanks Barry Begin forwarded message: From: "LOPEZ, David" > Subject: Issue with make, no prediction after gff3_merge Date: May 3, 2016 at 1:27:03 AM PDT To: "maker-devel-owner at yandell-lab.org" > Dear all, I am still waiting for my registration at maker-devell list hence I send my question as a mail but I will transfer it to the discussion group when possible. I am a commercial licenced user of Maker and I currently currently face some issues running Maker3.00.0 on a PBS cluster with an openMPI 1.10.2 implementation (Which runs great most of the time, but that is not the issue discussed here). After successfully testing the datatset provided in the package (dpp and pyu) I moved to my own assembly (140 000 scaffolds ~ 14GB, eukaryotic, premasked) I have already made some rnaseq mappings (gff) as well as CDNA and Proteome from reference genome. To me it appears that only fasta evidences are used but not the gff when I look at the predictions in IGV. In the gff from gff3_merge, I have blastx protein2genome and maker predictions as well as est_gff:stringtie but no est_gff:somethingelse from my CDNA and EST fed to maker. Another issue, potentially linked to this problem is that I wasn?t able to use tags in my gff evidences: maker fails to run telling that: mygff3evidencefile.gff:mylabel was not found which means it doesn?t interpret right the semicolum. I have attached my maker opts files. Thanks by advance for your help Best regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.ctl Type: application/octet-stream Size: 911 bytes Desc: maker_evm.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1601 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5236 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1508 bytes Desc: maker_bopts.ctl URL: From clements at galaxyproject.org Mon May 16 11:20:35 2016 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 16 May 2016 10:20:35 -0700 Subject: [maker-devel] GMOD 2016 Meeting early reg ends May 21; Galaxy Conference Deadlines Message-ID: Hello all, *GMOD will be holding a community meeting on June 30th and July 1st in Bloomington, Indiana, United States.* GMOD Meetings are a mix of user and developer presentations, and are a great place to find out what is happening in the project, what's coming up, and what others are doing. *Early bird registration ends May 21, this Saturday.* *For those who would like to present a talk or poster, the meeting registration form includes a section for submitting the presentation title and abstract.* If you have any suggestions or requests for the meeting, please contact the GMOD help desk . *GCC2016* The GMOD Meeting is immediately after the 2016 Galaxy Community Conference (GCC2016) , also in Bloomington (and sharing housing and venue). If you are interested in Galaxy, *GCC2016 has a number of deadlines this Friday, May 20*. See below. Galaxy is a part of the GMOD project and there are several presentations at GCC2016 that cover the GMOD integration: - Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform, talk presented by Margaret Staton - Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects , talk presented by Nathan Dunn (Apollo will have a poster and demo) - Hardwood Genomics Database (HGD): a web portal and database resource for hardwood tree genomic and genetic research, poster presented by Ming Chen and Margaret Staton (posters are not online yet) More posters and demos are in the works. Thanks, and hope to see you in Bloomington, Dave C ---------- Forwarded message ---------- From: Dave Clements Date: Mon, May 16, 2016 at 9:09 AM Subject: GCC2016 Deadlines this Friday & Conference schedule To: Galaxy Announcements List , Galaxy Dev List Hello all, This is just a reminder that* there are some key deadlines this Friday, May 20:* - Early registration ends . After Friday registration rates go up by over 40%. - Poster abstracts are due. - Demo abstracts are due. These are new this year and can complement a poster abstract or stand on their own. If you are wondering what's happening at GCC2016, the training and conference schedules are now online, featuring 21 accepted talks and 31 training sessions . And, thanks to Jetstream IU's newest National Science Foundation-funded project (and in which Galaxy is a partner), and the National Center for Genome Analysis Support at IU are sponsoring an opening reception on Monday evening at the IU Cyberinfrastructure Building. The first ever GCC opening reception will feature local wine/beer, morsels from local eateries, and demonstrations of the 15 million+ pixel IQ-Wall, IU's Data Center, Science on a Sphere, and other IU-centric IT. Hope to see you there, Dave C -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Mon May 16 19:30:23 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 17 May 2016 08:30:23 +0700 Subject: [maker-devel] Your .pdf document is attached Message-ID: <201605171983886.0A606AA3@m6888933.yandell-lab.org> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 21010F.zip Type: application/x-zip-compressed Size: 2960 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Wed May 18 12:33:39 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 18 May 2016 18:33:39 +0000 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: <573DB6D8-E773-45F5-9F53-DCAA20913EFF@genetics.utah.edu> Yes. Use top to check cpu usage. If it?s not 100% for the machine (or 6400% for all processes - 64 cpus * 100%), then we can look if you are launching the command correctly or have other issues. ?Carson On May 18, 2016, at 7:16 AM, Michael Campbell > wrote: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike On May 17, 2016, at 10:10 PM, Pei-Ying Huang > wrote: Hi mike, My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. Then autotrain with augustus, here is my command autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log But I saw one's method below, so I wonder if I am doing wrong? "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 1. EST data 2. Proteins from related species 3. a SNAP model trained using CEGMA 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 5. Running maker2zff on the output of MAKER, and converting that to GFF3 Once done, we run MAKER a second time using the Augustus model and more stringent settings." Thank you. Pei-Ying 2016-05-18 9:16 GMT+08:00 Michael Campbell >: Hi Pei-Ying, One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. I would only redo the sequencing if you are convinced that the original sequencing is bad. Mike On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: Hi mike, As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? Thank you. Pei-Ying 2016-05-09 22:18 GMT+08:00 Michael Campbell >: I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. Thanks, Mike On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: Have you done all of the test? What would you suggest me to run my data? To get ab initio model by setting the est2genome =1 and protein2genome = 0, then training with sanp model with est2genome = 0 and protein2genome = 0, training second snap model with est2genome = 0 and protein2genome = 0. Thank you. 2016-05-07 0:30 GMT+08:00 Michael Campbell >: So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. Mike On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: Hi Mike, I found one five_prime_UTP evidence, but only this one shown in the scaff0001. Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? Thank you. GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 Pei-Ying 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: Hi Mike, Any clue about the problems? Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. Thank you. Pei-Ying 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: Hi Mike, Attached file is the folder I use to run maker. Thank you. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] guliRN_L1_v1_mike.tar.gz[X] ? Pei-Ying 2016-05-04 22:54 GMT+08:00 Michael Campbell >: Hi Pei-Ying, If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. your maker_opts.ctl file looks fine. If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. Thanks, Mike On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: Hi Mike, basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker1.log[X] ?? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker_opts.log[X] ? less A_guli_1.all.gff GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 Thank you. Pei-Ying 2016-04-14 21:09 GMT+08:00 Michael Campbell >: It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. The attached protocol paper also addresses your quality question to an extent. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed May 18 07:16:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 18 May 2016 09:16:05 -0400 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike > On May 17, 2016, at 10:10 PM, Pei-Ying Huang wrote: > > Hi mike, > > My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. > Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. > Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. > > Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. > Then autotrain with augustus, here is my command > autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log > > > But I saw one's method below, so I wonder if I am doing wrong? > > "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: > 1. EST data > 2. Proteins from related species > 3. a SNAP model trained using CEGMA > 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) > 5. Running maker2zff on the output of MAKER, and converting that to GFF3 > Once done, we run MAKER a second time using the Augustus model and more stringent settings." > > Thank you. > Pei-Ying > > > > > > 2016-05-18 9:16 GMT+08:00 Michael Campbell >: > Hi Pei-Ying, > > One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. > > I would only redo the sequencing if you are convinced that the original sequencing is bad. > > Mike > > >> On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: >> >> Hi mike, >> >> As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? >> If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? >> Thank you. >> >> Pei-Ying >> >> 2016-05-09 22:18 GMT+08:00 Michael Campbell >: >> I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. >> >> Thanks, >> Mike >>> On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: >>> >>> Have you done all of the test? >>> What would you suggest me to run my data? >>> >>> To get ab initio model by setting the est2genome =1 and protein2genome = 0, >>> then training with sanp model with est2genome = 0 and protein2genome = 0, >>> training second snap model with est2genome = 0 and protein2genome = 0. >>> >>> Thank you. >>> >>> 2016-05-07 0:30 GMT+08:00 Michael Campbell >: >>> So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. >>> Mike >>>> On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: >>>> >>>> Hi Mike, >>>> >>>> I found one five_prime_UTP evidence, but only this one shown in the scaff0001. >>>> Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? >>>> Thank you. >>>> >>>> GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 >>>> GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 >>>> GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> >>>> Pei-Ying >>>> >>>> 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Any clue about the problems? >>>> Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. >>>> Thank you. >>>> >>>> Pei-Ying >>>> >>>> >>>> 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Attached file is the folder I use to run maker. Thank you. >>>> ? >>>> ?guliRN_L1_v1_mike.tar.gz ? >>>> Pei-Ying >>>> >>>> 2016-05-04 22:54 GMT+08:00 Michael Campbell >: >>>> Hi Pei-Ying, >>>> >>>> If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. >>>> >>>> your maker_opts.ctl file looks fine. >>>> >>>> If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. >>>> >>>> Thanks, >>>> Mike >>>>> On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: >>>>> >>>>> Hi Mike, >>>>> >>>>> basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. >>>>> I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. >>>>> Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. >>>>> >>>>> Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? >>>>> When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. >>>>> >>>>> /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff >>>>> >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. >>>>> ? >>>>> ?maker1.log ?? >>>>> ?maker_opts.log ? >>>>> less A_guli_1.all.gff >>>>> GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 >>>>> GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 >>>>> GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> >>>>> Thank you. >>>>> Pei-Ying >>>>> >>>>> >>>>> >>>>> >>>>> 2016-04-14 21:09 GMT+08:00 Michael Campbell >: >>>>> It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. >>>>> >>>>> It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. >>>>> >>>>> Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. >>>>> >>>>> The attached protocol paper also addresses your quality question to an extent. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:08:52 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:08:52 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: References: Message-ID: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. ?Carson > On May 24, 2016, at 10:58 AM, Won C Yim wrote: > > Dear MAKER team, > > We have been using MAKER to generate our plant genome annotations. > > Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. > > Is there any way to discard single exon genes? > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Tue May 24 11:11:55 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 24 May 2016 13:11:55 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, Just an update, that was indeed my issue. Thanks for your help! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Wed, May 11, 2016 at 11:35 AM, Seth Munholland wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ > install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > >> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t >> give developers that kind of low level access to memory). However if you >> are using MPI, then it could be causing a seg fault, or one of the programs >> MAKER is calling could be seg faulting (like BLAST). >> >> So if you are using MPI, let me know which flavor and I can make >> suggestions (for example MVAPICH2 is incompatible with programs that do >> system calls, and OpenMPI may require special setting for LD_PRELOAD to >> work properly with shared libraries). If your not using MPI, then you will >> need to look at the installed programs MAKER is calling and reinstall them, >> update them, or roll back a version (i.e. BLAST, Exonerate, etc.) >> >> ?Carson >> >> >> >> On May 10, 2016, at 12:18 PM, Seth Munholland >> wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg >> faulting. I've changed the the dataset I'm running MAKER on, by parsing >> out smaller sections of the larger assembly, and I still seg fault on >> sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have >> jsut finished a tblastx. Any suggestions for how I can debug and correct >> this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:12:42 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:12:42 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Great to know it?s working for you now. ?Carson > On May 24, 2016, at 11:11 AM, Seth Munholland wrote: > > Hi Carson, > > Just an update, that was indeed my issue. Thanks for your help! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Wed, May 11, 2016 at 11:35 AM, Seth Munholland > wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Tue, May 10, 2016 at 7:02 PM, Carson Holt > wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > >> On May 10, 2016, at 12:18 PM, Seth Munholland > wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <>_______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:14:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:14:56 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> References: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Message-ID: <5DEE67B4-F022-479E-A5C5-97F76FD6601D@gmail.com> As a side note. Many of the newer plant genomes I?ve worked on have had entire yeast and bacterial genome sequenced into their assemblies (as their own separate contigs even). It is a common issue that is easily identified by just looking at a few of the more gene dense contigs in a browser like apollo. ?Carson > On May 24, 2016, at 11:08 AM, Carson Holt wrote: > > Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). > > What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. > > ?Carson > > > >> On May 24, 2016, at 10:58 AM, Won C Yim > wrote: >> >> Dear MAKER team, >> >> We have been using MAKER to generate our plant genome annotations. >> >> Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. >> >> Is there any way to discard single exon genes? >> >> Regards, >> >> Won >> >> -- >> Yim, Won Cheol >> MS330/Department of Biochemistry & Molecular Biology >> 1664 N. Virginia Street >> University of Nevada, Reno >> >> email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wyim at unr.edu Tue May 24 10:58:41 2016 From: wyim at unr.edu (Won C Yim) Date: Tue, 24 May 2016 16:58:41 +0000 Subject: [maker-devel] Single exon in GFF file. Message-ID: Dear MAKER team, We have been using MAKER to generate our plant genome annotations. Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. Is there any way to discard single exon genes? Regards, Won -- Yim, Won Cheol MS330/Department of Biochemistry & Molecular Biology 1664 N. Virginia Street University of Nevada, Reno email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From debarryj at gmail.com Thu May 26 13:54:37 2016 From: debarryj at gmail.com (Jeremy DeBarry) Date: Thu, 26 May 2016 12:54:37 -0700 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon Message-ID: Greetings, My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. Do you have any insight? Much appreciated, Jeremy -- Dr. Jeremy DeBarry PhD MaHPIC Data Coordinator Kissinger Research Group The University of Georgia ::: Email: debarryj at gmail.com Tel: +1.912.269.0484 Skype ID: jdebarry ::: Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 27 08:15:47 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 May 2016 08:15:47 -0600 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon In-Reply-To: References: Message-ID: Make sure your using the current version of MAKER. That same thread mentions it was fixed once they updated from 2.31.3 Current version is 2.31.8. ?Carson > On May 26, 2016, at 1:54 PM, Jeremy DeBarry wrote: > > Greetings, > My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. > > I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. > > My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. > > Do you have any insight? > > Much appreciated, > Jeremy > > -- > Dr. Jeremy DeBarry PhD > MaHPIC Data Coordinator > Kissinger Research Group > > The University of Georgia > ::: > Email: debarryj at gmail.com > Tel: +1.912.269.0484 > Skype ID: jdebarry > ::: > Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From philipp.bayer at uwa.edu.au Tue May 31 00:37:46 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Tue, 31 May 2016 14:37:46 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations Message-ID: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Hello, I have a minor question about the way MAKER joins annotations from different chunks when using MPI. Let's say I have a longer gene that bridges two chunks, so the jobs annotating both chunks separately would return two incomplete genes, one without a stop codon, one without a start codon. I assume MAKER would then join those two into a single gene, right? Is this behaviour influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? Thank you Philipp Bayer From carsonhh at gmail.com Tue May 31 09:51:34 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 May 2016 09:51:34 -0600 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Message-ID: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. ?Carson > On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: > > Hello, > > I have a minor question about the way MAKER joins annotations from > different chunks when using MPI. > > Let's say I have a longer gene that bridges two chunks, so the jobs > annotating both chunks separately would return two incomplete genes, one > without a stop codon, one without a start codon. I assume MAKER would > then join those two into a single gene, right? Is this behaviour > influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? > > Thank you > > Philipp Bayer > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From philipp.bayer at uwa.edu.au Tue May 31 19:57:10 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Wed, 1 Jun 2016 09:57:10 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Message-ID: Hello, thank you very much for your detailed answer! Looks like I had misinterpreted some details of the program, this is very helpful, thank you! Cheers Philipp On 31.05.2016 23:51, Carson Holt wrote: > Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). > > The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. > > The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). > > The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. > > ?Carson > >> On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: >> >> Hello, >> >> I have a minor question about the way MAKER joins annotations from >> different chunks when using MPI. >> >> Let's say I have a longer gene that bridges two chunks, so the jobs >> annotating both chunks separately would return two incomplete genes, one >> without a stop codon, one without a start codon. I assume MAKER would >> then join those two into a single gene, right? Is this behaviour >> influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? >> >> Thank you >> >> Philipp Bayer >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From maker-devel at yandell-lab.org Tue May 3 08:16:56 2016 From: maker-devel at yandell-lab.org (CamScanner) Date: Tue, 03 May 2016 19:46:56 +0530 Subject: [maker-devel] New Doc 199 Page 8 Message-ID: <100836F7888FB647A8B3B49B7D2F0D256047122DF01B8AFC819D436C@yandell-lab.org> Scanned by CamScanner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: New Doc 101_0.zip Type: application/zip Size: 5377 bytes Desc: not available URL: From munholl at uwindsor.ca Tue May 10 12:18:09 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 10 May 2016 14:18:09 -0400 Subject: [maker-devel] MAKER seg faulting Message-ID: Hello Everyone, For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 10 17:02:30 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 May 2016 17:02:30 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: Message-ID: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) ?Carson > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 11 09:35:05 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 11 May 2016 11:35:05 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t > give developers that kind of low level access to memory). However if you > are using MPI, then it could be causing a seg fault, or one of the programs > MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make > suggestions (for example MVAPICH2 is incompatible with programs that do > system calls, and OpenMPI may require special setting for LD_PRELOAD to > work properly with shared libraries). If your not using MPI, then you will > need to look at the installed programs MAKER is calling and reinstall them, > update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg > faulting. I've changed the the dataset I'm running MAKER on, by parsing > out smaller sections of the larger assembly, and I still seg fault on > sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have > jsut finished a tblastx. Any suggestions for how I can debug and correct > this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Thu May 12 15:49:51 2016 From: platycerus at gmail.com (Ray Cui) Date: Thu, 12 May 2016 23:49:51 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Dear Yugui I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. Ray On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from > /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:31:55 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:31:55 +1000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] wrote: > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? CEGMA has > been discontinued but I used to use the cegma2zff.pl steps to use those > proteins as a training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly utilized. I > didn't see anything specific about this in the archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its attachments is > confidential and may contain sensitive information. It should not be used > by anyone who is not the original intended recipient. If you have received > this e-mail in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of Allergy and > Infectious Diseases shall not accept liability for any statements made that > are sender's own and not expressly made on behalf of the NIAID by one of > its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:37:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:37:03 +1000 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: <1460516670248.1644@uq.edu.au> References: <1460516670248.1644@uq.edu.au> Message-ID: Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too On 13 April 2016 at 13:05, Jenny Lee wrote: > Hi all, > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which > have interproscan hits). I would like to also exclude the repeats > information and only retain the CDS, gene, exon and mRNA - like the > format we usually see in published data. > > > I've been trying to do this manually and it gets messy. Any ideas? > > > Thanks a lot. > > > Regards, > > Jenny Lee > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri May 13 01:56:58 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 13 May 2016 09:56:58 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos wrote: > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: > >> Hello, >> >> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >> been discontinued but I used to use the cegma2zff.pl steps to use those >> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >> what file to pass from BUSCO to MAKER for this to be properly utilized. I >> didn't see anything specific about this in the archives. >> ----- >> >> *Ian Misner, Ph.D.* >> >> Computational Genomics Specialist >> >> Contractor, Medical Science and Computing, Inc. >> >> Bioinformatics and Computational Biosciences Branch (BCBB) >> >> NIH/NIAID/OD/OSMO/OCICB >> >> 5601 Fishers Lane, Room 4A59 >> >> Rockville, MD 20892 >> >> Office: 301-761-6208 >> >> Mobile: 301-704-0151 >> >> Email: ian.misner at nih.gov >> >> Web: BCBB Home Page >> >> Twitter: @NIAIDBioIT >> >> >> >> Disclaimer: The information in this e-mail and any of its attachments is >> confidential and may contain sensitive information. It should not be used >> by anyone who is not the original intended recipient. If you have received >> this e-mail in error please inform the sender and delete it from your >> mailbox or any other storage devices. National Institute of Allergy and >> Infectious Diseases shall not accept liability for any statements made that >> are sender's own and not expressly made on behalf of the NIAID by one of >> its representatives. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri May 13 04:14:57 2016 From: fdolze at students.uni-mainz.de (Dolze, Florian) Date: Fri, 13 May 2016 12:14:57 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> On a somewhat related note, is there an advantage of using BUSCO to train Augustus instead of the provided Augustus webtraining service? Does anybody know how those 2 compare? Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and > then, in the maker_opts.ctl file, you should point the > augustus_species variable to your trained species (i.e. the name you > pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer > BUSCO versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > > wrote: > > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > > wrote: > > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? > CEGMA has been discontinued but I used to use the cegma2zff.pl > steps to use those proteins as a > training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly > utilized. I didn't see anything specific about this in the > archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its > attachments is confidential and may contain sensitive > information. It should not be used by anyone who is not the > original intended recipient. If you have received this e-mail > in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of > Allergy and Infectious Diseases shall not accept liability for > any statements made that are sender's own and not expressly > made on behalf of the NIAID by one of its representatives. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, /PhD/ > /Research Associate/ > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 02:28:00 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 10:28:00 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis wrote: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 06:54:40 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 14:54:40 +0200 Subject: [maker-devel] BUSCO In-Reply-To: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> References: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> Message-ID: I would guess that the main 'advantage' of using BUSCO to train Augustus is that one will probably run BUSCO on one's genome anyway before starting MAKER, so there will already be a useful set of trained parameters ready to use. I guess the 'advantage' of using the Augustus webtraining service is that one could give it much more starting data (if indeed this is available, e.g. cDNAs). Indeed if there was enough time and it made a substantial difference one might even use the BUSCO gene model output as the 'Training gene structure file' for Augustus webtraining service. I don't believe that anyone has done a comparison on how different the trained parameters end up being. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 12:14, Dolze, Florian wrote: > > On a somewhat related note, is there an advantage of using BUSCO to train > Augustus instead of the provided Augustus webtraining service? Does anybody > know how those 2 compare? > > > > Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing listmaker-devel at yandell-lab.orghttp://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 13 10:25:56 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 May 2016 16:25:56 +0000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Our group have mainly used the BUSCO model in the ?bootstrap? run for MAKER, then retrain Augustus and SNAP using a filtered data set from that run for new rounds of MAKER. Also, one personal observation: we have found some genome assemblies where BUSCO performs poorly compared to CEGMA (e.g. BUSCO reports poor overall percent of SCO present, while CEGMA reports much higher numbers). We?re still delving into this, but in those cases we avoid using the BUSCO model for obvious reasons. chris On May 13, 2016, at 3:28 AM, Robert Waterhouse > wrote: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis > wrote: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > wrote: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: Hello, Are there any guidelines for using BUSCO to help train MAKER? CEGMA has been discontinued but I used to use the cegma2zff.pl steps to use those proteins as a training step. BUSCO seems to train Augustus but I'm not sure what file to pass from BUSCO to MAKER for this to be properly utilized. I didn't see anything specific about this in the archives. ----- Ian Misner, Ph.D. Computational Genomics Specialist Contractor, Medical Science and Computing, Inc. Bioinformatics and Computational Biosciences Branch (BCBB) NIH/NIAID/OD/OSMO/OCICB 5601 Fishers Lane, Room 4A59 Rockville, MD 20892 Office: 301-761-6208 Mobile: 301-704-0151 Email: ian.misner at nih.gov Web: BCBB Home Page Twitter: @NIAIDBioIT Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 13 09:34:00 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 13 May 2016 11:34:00 -0400 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: References: <1460516670248.1644@uq.edu.au> Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A@gmail.com> I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. Mike > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos wrote: > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > On 13 April 2016 at 13:05, Jenny Lee > wrote: > Hi all, > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > I've been trying to do this manually and it gets messy. Any ideas? > > Thanks a lot. > > Regards, > Jenny Lee > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bi0411 (1).pdf Type: application/pdf Size: 484329 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:32:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:32:40 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). ?Carson > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. > > Ray > > On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.p.price at gmail.com Fri May 13 10:35:21 2016 From: dave.p.price at gmail.com (David Price) Date: Fri, 13 May 2016 10:35:21 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: would it be possible to get digest mode set up properly? I have it selected but I get emails for each individual message. Thanks On Fri, May 13, 2016 at 10:27 AM, wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > To: Xabier V?zquez Campos > Cc: Jenny Lee , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com> > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to > do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need > to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee h.lee12 at uq.edu.au>> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which have > interproscan hits). I would like to also exclude the repeats information > and only retain the CDS, gene, exon and mRNA - like the format we usually > see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.pdf > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:46:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:46:38 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: <9ABEE8DB-6316-4CF1-BC46-0DB2C188BC44@gmail.com> I toggled off and back on your digest option incase that is the issue. The Mailman docs say that on busy days the digest option may decide to send out more than one digest, so that could be the issue too. The company providing out mail list was having issues the last few weeks, so we weren?t able to approve most posts until yesterday. As a result, there was an explosion of approved posts that may have triggered the digest to be more than 1 per day yesterday and today. ?Carson > On May 13, 2016, at 10:35 AM, David Price wrote: > > would it be possible to get digest mode set up properly? > I have it selected but I get emails for each individual message. > > Thanks > > On Fri, May 13, 2016 at 10:27 AM, > wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > > To: Xabier V?zquez Campos > > Cc: Jenny Lee >, "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com > > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee >> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Fri May 13 11:08:19 2016 From: platycerus at gmail.com (Ray Cui) Date: Fri, 13 May 2016 19:08:19 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Hello, I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). It works fine with MpiCH so I stopped looking into this. Ray On Fri, May 13, 2016 at 6:32 PM, Carson Holt wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the > ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on > CentOS systems. To get around that issue, compile OpenMPI with gcc instead > of the intel compiler, or alternatively manually install a separate perl > installation without pthread support (i.e. pthreads disabled during the > configure step). > > ?Carson > > > > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with > Maker. I now use mpich, which works. > > Ray > On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from >> /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from >> /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 11:16:49 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 11:16:49 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: It?s possible it was set wrong as there may be more than one libmpi.so on the system. It also has to be set before compiling and every time you run. The next issue is that some systems (like ubuntu) will often have extra mpicc, libmpi.so, and mpiexec files that don?t match the OpenMPI you are trying to use. Tracking down those mismatches before compiling and ensuring that they don?t revert with your bashrc/bash_profile can be complicated. In these cases you may also have to additionally specify LD_PRELOAD with the -x parameter of the OpenMPI mpiexec command. You often have to specify the ?-mca btl? parameter explained in the INSTALL file as well. ?Carson > On May 13, 2016, at 11:08 AM, Ray Cui wrote: > > Hello, > > I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). > It works fine with MpiCH so I stopped looking into this. > > Ray > > On Fri, May 13, 2016 at 6:32 PM, Carson Holt > wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). > > ?Carson > > > >> On May 12, 2016, at 3:49 PM, Ray Cui > wrote: >> >> Dear Yugui >> >> I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. >> >> Ray >> >> On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Sun May 15 18:57:37 2016 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 16 May 2016 00:57:37 +0000 Subject: [maker-devel] Fwd: Issue with make, no prediction after gff3_merge References: Message-ID: <2B3935BF-1995-4250-8694-92FA0C36A729@genetics.utah.edu> Hi David, First of all apologies for the delay in addressing your e-mail, our mailing list software (provided by an external ISP) has stopped supporting the MailMan software that is running the maker-devel list and the software has been unresponsive to our attempts to add new users or moderate messages. We will handle this message directly through e-mail for now. We have requested a new maker mailing list through our University IT department and that is request pending approval. The new mailing list should get our experience should get our user support back to normal very soon. Can you share a few lines of the GFF files that you passed to est_gff? Thanks Barry Begin forwarded message: From: "LOPEZ, David" > Subject: Issue with make, no prediction after gff3_merge Date: May 3, 2016 at 1:27:03 AM PDT To: "maker-devel-owner at yandell-lab.org" > Dear all, I am still waiting for my registration at maker-devell list hence I send my question as a mail but I will transfer it to the discussion group when possible. I am a commercial licenced user of Maker and I currently currently face some issues running Maker3.00.0 on a PBS cluster with an openMPI 1.10.2 implementation (Which runs great most of the time, but that is not the issue discussed here). After successfully testing the datatset provided in the package (dpp and pyu) I moved to my own assembly (140 000 scaffolds ~ 14GB, eukaryotic, premasked) I have already made some rnaseq mappings (gff) as well as CDNA and Proteome from reference genome. To me it appears that only fasta evidences are used but not the gff when I look at the predictions in IGV. In the gff from gff3_merge, I have blastx protein2genome and maker predictions as well as est_gff:stringtie but no est_gff:somethingelse from my CDNA and EST fed to maker. Another issue, potentially linked to this problem is that I wasn?t able to use tags in my gff evidences: maker fails to run telling that: mygff3evidencefile.gff:mylabel was not found which means it doesn?t interpret right the semicolum. I have attached my maker opts files. Thanks by advance for your help Best regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.ctl Type: application/octet-stream Size: 911 bytes Desc: maker_evm.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1601 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5236 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1508 bytes Desc: maker_bopts.ctl URL: From clements at galaxyproject.org Mon May 16 11:20:35 2016 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 16 May 2016 10:20:35 -0700 Subject: [maker-devel] GMOD 2016 Meeting early reg ends May 21; Galaxy Conference Deadlines Message-ID: Hello all, *GMOD will be holding a community meeting on June 30th and July 1st in Bloomington, Indiana, United States.* GMOD Meetings are a mix of user and developer presentations, and are a great place to find out what is happening in the project, what's coming up, and what others are doing. *Early bird registration ends May 21, this Saturday.* *For those who would like to present a talk or poster, the meeting registration form includes a section for submitting the presentation title and abstract.* If you have any suggestions or requests for the meeting, please contact the GMOD help desk . *GCC2016* The GMOD Meeting is immediately after the 2016 Galaxy Community Conference (GCC2016) , also in Bloomington (and sharing housing and venue). If you are interested in Galaxy, *GCC2016 has a number of deadlines this Friday, May 20*. See below. Galaxy is a part of the GMOD project and there are several presentations at GCC2016 that cover the GMOD integration: - Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform, talk presented by Margaret Staton - Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects , talk presented by Nathan Dunn (Apollo will have a poster and demo) - Hardwood Genomics Database (HGD): a web portal and database resource for hardwood tree genomic and genetic research, poster presented by Ming Chen and Margaret Staton (posters are not online yet) More posters and demos are in the works. Thanks, and hope to see you in Bloomington, Dave C ---------- Forwarded message ---------- From: Dave Clements Date: Mon, May 16, 2016 at 9:09 AM Subject: GCC2016 Deadlines this Friday & Conference schedule To: Galaxy Announcements List , Galaxy Dev List Hello all, This is just a reminder that* there are some key deadlines this Friday, May 20:* - Early registration ends . After Friday registration rates go up by over 40%. - Poster abstracts are due. - Demo abstracts are due. These are new this year and can complement a poster abstract or stand on their own. If you are wondering what's happening at GCC2016, the training and conference schedules are now online, featuring 21 accepted talks and 31 training sessions . And, thanks to Jetstream IU's newest National Science Foundation-funded project (and in which Galaxy is a partner), and the National Center for Genome Analysis Support at IU are sponsoring an opening reception on Monday evening at the IU Cyberinfrastructure Building. The first ever GCC opening reception will feature local wine/beer, morsels from local eateries, and demonstrations of the 15 million+ pixel IQ-Wall, IU's Data Center, Science on a Sphere, and other IU-centric IT. Hope to see you there, Dave C -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Mon May 16 19:30:23 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 17 May 2016 08:30:23 +0700 Subject: [maker-devel] Your .pdf document is attached Message-ID: <201605171983886.0A606AA3@m6888933.yandell-lab.org> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 21010F.zip Type: application/x-zip-compressed Size: 2960 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Wed May 18 12:33:39 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 18 May 2016 18:33:39 +0000 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: <573DB6D8-E773-45F5-9F53-DCAA20913EFF@genetics.utah.edu> Yes. Use top to check cpu usage. If it?s not 100% for the machine (or 6400% for all processes - 64 cpus * 100%), then we can look if you are launching the command correctly or have other issues. ?Carson On May 18, 2016, at 7:16 AM, Michael Campbell > wrote: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike On May 17, 2016, at 10:10 PM, Pei-Ying Huang > wrote: Hi mike, My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. Then autotrain with augustus, here is my command autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log But I saw one's method below, so I wonder if I am doing wrong? "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 1. EST data 2. Proteins from related species 3. a SNAP model trained using CEGMA 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 5. Running maker2zff on the output of MAKER, and converting that to GFF3 Once done, we run MAKER a second time using the Augustus model and more stringent settings." Thank you. Pei-Ying 2016-05-18 9:16 GMT+08:00 Michael Campbell >: Hi Pei-Ying, One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. I would only redo the sequencing if you are convinced that the original sequencing is bad. Mike On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: Hi mike, As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? Thank you. Pei-Ying 2016-05-09 22:18 GMT+08:00 Michael Campbell >: I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. Thanks, Mike On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: Have you done all of the test? What would you suggest me to run my data? To get ab initio model by setting the est2genome =1 and protein2genome = 0, then training with sanp model with est2genome = 0 and protein2genome = 0, training second snap model with est2genome = 0 and protein2genome = 0. Thank you. 2016-05-07 0:30 GMT+08:00 Michael Campbell >: So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. Mike On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: Hi Mike, I found one five_prime_UTP evidence, but only this one shown in the scaff0001. Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? Thank you. GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 Pei-Ying 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: Hi Mike, Any clue about the problems? Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. Thank you. Pei-Ying 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: Hi Mike, Attached file is the folder I use to run maker. Thank you. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] guliRN_L1_v1_mike.tar.gz[X] ? Pei-Ying 2016-05-04 22:54 GMT+08:00 Michael Campbell >: Hi Pei-Ying, If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. your maker_opts.ctl file looks fine. If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. Thanks, Mike On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: Hi Mike, basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker1.log[X] ?? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker_opts.log[X] ? less A_guli_1.all.gff GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 Thank you. Pei-Ying 2016-04-14 21:09 GMT+08:00 Michael Campbell >: It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. The attached protocol paper also addresses your quality question to an extent. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed May 18 07:16:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 18 May 2016 09:16:05 -0400 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike > On May 17, 2016, at 10:10 PM, Pei-Ying Huang wrote: > > Hi mike, > > My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. > Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. > Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. > > Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. > Then autotrain with augustus, here is my command > autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log > > > But I saw one's method below, so I wonder if I am doing wrong? > > "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: > 1. EST data > 2. Proteins from related species > 3. a SNAP model trained using CEGMA > 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) > 5. Running maker2zff on the output of MAKER, and converting that to GFF3 > Once done, we run MAKER a second time using the Augustus model and more stringent settings." > > Thank you. > Pei-Ying > > > > > > 2016-05-18 9:16 GMT+08:00 Michael Campbell >: > Hi Pei-Ying, > > One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. > > I would only redo the sequencing if you are convinced that the original sequencing is bad. > > Mike > > >> On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: >> >> Hi mike, >> >> As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? >> If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? >> Thank you. >> >> Pei-Ying >> >> 2016-05-09 22:18 GMT+08:00 Michael Campbell >: >> I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. >> >> Thanks, >> Mike >>> On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: >>> >>> Have you done all of the test? >>> What would you suggest me to run my data? >>> >>> To get ab initio model by setting the est2genome =1 and protein2genome = 0, >>> then training with sanp model with est2genome = 0 and protein2genome = 0, >>> training second snap model with est2genome = 0 and protein2genome = 0. >>> >>> Thank you. >>> >>> 2016-05-07 0:30 GMT+08:00 Michael Campbell >: >>> So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. >>> Mike >>>> On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: >>>> >>>> Hi Mike, >>>> >>>> I found one five_prime_UTP evidence, but only this one shown in the scaff0001. >>>> Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? >>>> Thank you. >>>> >>>> GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 >>>> GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 >>>> GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> >>>> Pei-Ying >>>> >>>> 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Any clue about the problems? >>>> Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. >>>> Thank you. >>>> >>>> Pei-Ying >>>> >>>> >>>> 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Attached file is the folder I use to run maker. Thank you. >>>> ? >>>> ?guliRN_L1_v1_mike.tar.gz ? >>>> Pei-Ying >>>> >>>> 2016-05-04 22:54 GMT+08:00 Michael Campbell >: >>>> Hi Pei-Ying, >>>> >>>> If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. >>>> >>>> your maker_opts.ctl file looks fine. >>>> >>>> If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. >>>> >>>> Thanks, >>>> Mike >>>>> On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: >>>>> >>>>> Hi Mike, >>>>> >>>>> basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. >>>>> I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. >>>>> Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. >>>>> >>>>> Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? >>>>> When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. >>>>> >>>>> /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff >>>>> >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. >>>>> ? >>>>> ?maker1.log ?? >>>>> ?maker_opts.log ? >>>>> less A_guli_1.all.gff >>>>> GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 >>>>> GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 >>>>> GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> >>>>> Thank you. >>>>> Pei-Ying >>>>> >>>>> >>>>> >>>>> >>>>> 2016-04-14 21:09 GMT+08:00 Michael Campbell >: >>>>> It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. >>>>> >>>>> It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. >>>>> >>>>> Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. >>>>> >>>>> The attached protocol paper also addresses your quality question to an extent. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:08:52 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:08:52 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: References: Message-ID: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. ?Carson > On May 24, 2016, at 10:58 AM, Won C Yim wrote: > > Dear MAKER team, > > We have been using MAKER to generate our plant genome annotations. > > Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. > > Is there any way to discard single exon genes? > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Tue May 24 11:11:55 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 24 May 2016 13:11:55 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, Just an update, that was indeed my issue. Thanks for your help! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Wed, May 11, 2016 at 11:35 AM, Seth Munholland wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ > install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > >> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t >> give developers that kind of low level access to memory). However if you >> are using MPI, then it could be causing a seg fault, or one of the programs >> MAKER is calling could be seg faulting (like BLAST). >> >> So if you are using MPI, let me know which flavor and I can make >> suggestions (for example MVAPICH2 is incompatible with programs that do >> system calls, and OpenMPI may require special setting for LD_PRELOAD to >> work properly with shared libraries). If your not using MPI, then you will >> need to look at the installed programs MAKER is calling and reinstall them, >> update them, or roll back a version (i.e. BLAST, Exonerate, etc.) >> >> ?Carson >> >> >> >> On May 10, 2016, at 12:18 PM, Seth Munholland >> wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg >> faulting. I've changed the the dataset I'm running MAKER on, by parsing >> out smaller sections of the larger assembly, and I still seg fault on >> sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have >> jsut finished a tblastx. Any suggestions for how I can debug and correct >> this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:12:42 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:12:42 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Great to know it?s working for you now. ?Carson > On May 24, 2016, at 11:11 AM, Seth Munholland wrote: > > Hi Carson, > > Just an update, that was indeed my issue. Thanks for your help! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Wed, May 11, 2016 at 11:35 AM, Seth Munholland > wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Tue, May 10, 2016 at 7:02 PM, Carson Holt > wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > >> On May 10, 2016, at 12:18 PM, Seth Munholland > wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <>_______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:14:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:14:56 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> References: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Message-ID: <5DEE67B4-F022-479E-A5C5-97F76FD6601D@gmail.com> As a side note. Many of the newer plant genomes I?ve worked on have had entire yeast and bacterial genome sequenced into their assemblies (as their own separate contigs even). It is a common issue that is easily identified by just looking at a few of the more gene dense contigs in a browser like apollo. ?Carson > On May 24, 2016, at 11:08 AM, Carson Holt wrote: > > Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). > > What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. > > ?Carson > > > >> On May 24, 2016, at 10:58 AM, Won C Yim > wrote: >> >> Dear MAKER team, >> >> We have been using MAKER to generate our plant genome annotations. >> >> Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. >> >> Is there any way to discard single exon genes? >> >> Regards, >> >> Won >> >> -- >> Yim, Won Cheol >> MS330/Department of Biochemistry & Molecular Biology >> 1664 N. Virginia Street >> University of Nevada, Reno >> >> email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wyim at unr.edu Tue May 24 10:58:41 2016 From: wyim at unr.edu (Won C Yim) Date: Tue, 24 May 2016 16:58:41 +0000 Subject: [maker-devel] Single exon in GFF file. Message-ID: Dear MAKER team, We have been using MAKER to generate our plant genome annotations. Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. Is there any way to discard single exon genes? Regards, Won -- Yim, Won Cheol MS330/Department of Biochemistry & Molecular Biology 1664 N. Virginia Street University of Nevada, Reno email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From debarryj at gmail.com Thu May 26 13:54:37 2016 From: debarryj at gmail.com (Jeremy DeBarry) Date: Thu, 26 May 2016 12:54:37 -0700 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon Message-ID: Greetings, My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. Do you have any insight? Much appreciated, Jeremy -- Dr. Jeremy DeBarry PhD MaHPIC Data Coordinator Kissinger Research Group The University of Georgia ::: Email: debarryj at gmail.com Tel: +1.912.269.0484 Skype ID: jdebarry ::: Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 27 08:15:47 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 May 2016 08:15:47 -0600 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon In-Reply-To: References: Message-ID: Make sure your using the current version of MAKER. That same thread mentions it was fixed once they updated from 2.31.3 Current version is 2.31.8. ?Carson > On May 26, 2016, at 1:54 PM, Jeremy DeBarry wrote: > > Greetings, > My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. > > I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. > > My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. > > Do you have any insight? > > Much appreciated, > Jeremy > > -- > Dr. Jeremy DeBarry PhD > MaHPIC Data Coordinator > Kissinger Research Group > > The University of Georgia > ::: > Email: debarryj at gmail.com > Tel: +1.912.269.0484 > Skype ID: jdebarry > ::: > Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From philipp.bayer at uwa.edu.au Tue May 31 00:37:46 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Tue, 31 May 2016 14:37:46 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations Message-ID: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Hello, I have a minor question about the way MAKER joins annotations from different chunks when using MPI. Let's say I have a longer gene that bridges two chunks, so the jobs annotating both chunks separately would return two incomplete genes, one without a stop codon, one without a start codon. I assume MAKER would then join those two into a single gene, right? Is this behaviour influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? Thank you Philipp Bayer From carsonhh at gmail.com Tue May 31 09:51:34 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 May 2016 09:51:34 -0600 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Message-ID: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. ?Carson > On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: > > Hello, > > I have a minor question about the way MAKER joins annotations from > different chunks when using MPI. > > Let's say I have a longer gene that bridges two chunks, so the jobs > annotating both chunks separately would return two incomplete genes, one > without a stop codon, one without a start codon. I assume MAKER would > then join those two into a single gene, right? Is this behaviour > influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? > > Thank you > > Philipp Bayer > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From philipp.bayer at uwa.edu.au Tue May 31 19:57:10 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Wed, 1 Jun 2016 09:57:10 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Message-ID: Hello, thank you very much for your detailed answer! Looks like I had misinterpreted some details of the program, this is very helpful, thank you! Cheers Philipp On 31.05.2016 23:51, Carson Holt wrote: > Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). > > The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. > > The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). > > The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. > > ?Carson > >> On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: >> >> Hello, >> >> I have a minor question about the way MAKER joins annotations from >> different chunks when using MPI. >> >> Let's say I have a longer gene that bridges two chunks, so the jobs >> annotating both chunks separately would return two incomplete genes, one >> without a stop codon, one without a start codon. I assume MAKER would >> then join those two into a single gene, right? Is this behaviour >> influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? >> >> Thank you >> >> Philipp Bayer >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From maker-devel at yandell-lab.org Tue May 3 08:16:56 2016 From: maker-devel at yandell-lab.org (CamScanner) Date: Tue, 03 May 2016 19:46:56 +0530 Subject: [maker-devel] New Doc 199 Page 8 Message-ID: <100836F7888FB647A8B3B49B7D2F0D256047122DF01B8AFC819D436C@yandell-lab.org> Scanned by CamScanner -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: New Doc 101_0.zip Type: application/zip Size: 5377 bytes Desc: not available URL: From munholl at uwindsor.ca Tue May 10 12:18:09 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 10 May 2016 14:18:09 -0400 Subject: [maker-devel] MAKER seg faulting Message-ID: Hello Everyone, For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 10 17:02:30 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 10 May 2016 17:02:30 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: Message-ID: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) ?Carson > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>_______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Wed May 11 09:35:05 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Wed, 11 May 2016 11:35:05 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t > give developers that kind of low level access to memory). However if you > are using MPI, then it could be causing a seg fault, or one of the programs > MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make > suggestions (for example MVAPICH2 is incompatible with programs that do > system calls, and OpenMPI may require special setting for LD_PRELOAD to > work properly with shared libraries). If your not using MPI, then you will > need to look at the installed programs MAKER is calling and reinstall them, > update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > > On May 10, 2016, at 12:18 PM, Seth Munholland wrote: > > Hello Everyone, > > For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg > faulting. I've changed the the dataset I'm running MAKER on, by parsing > out smaller sections of the larger assembly, and I still seg fault on > sections that the larger assembly moved past without issue. > > The only commonality I see is every tme it seg faults it appears to have > jsut finished a tblastx. Any suggestions for how I can debug and correct > this issue? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Thu May 12 15:49:51 2016 From: platycerus at gmail.com (Ray Cui) Date: Thu, 12 May 2016 23:49:51 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Dear Yugui I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. Ray On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from > /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:31:55 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:31:55 +1000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] wrote: > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? CEGMA has > been discontinued but I used to use the cegma2zff.pl steps to use those > proteins as a training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly utilized. I > didn't see anything specific about this in the archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its attachments is > confidential and may contain sensitive information. It should not be used > by anyone who is not the original intended recipient. If you have received > this e-mail in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of Allergy and > Infectious Diseases shall not accept liability for any statements made that > are sender's own and not expressly made on behalf of the NIAID by one of > its representatives. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From xvazquezc at gmail.com Thu May 12 18:37:03 2016 From: xvazquezc at gmail.com (=?UTF-8?Q?Xabier_V=C3=A1zquez_Campos?=) Date: Fri, 13 May 2016 10:37:03 +1000 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: <1460516670248.1644@uq.edu.au> References: <1460516670248.1644@uq.edu.au> Message-ID: Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too On 13 April 2016 at 13:05, Jenny Lee wrote: > Hi all, > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which > have interproscan hits). I would like to also exclude the repeats > information and only retain the CDS, gene, exon and mRNA - like the > format we usually see in published data. > > > I've been trying to do this manually and it gets messy. Any ideas? > > > Thanks a lot. > > > Regards, > > Jenny Lee > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -- Xabier V?zquez-Campos, *PhD* *Research Associate* Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Fri May 13 01:56:58 2016 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Fri, 13 May 2016 09:56:58 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos wrote: > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: > >> Hello, >> >> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >> been discontinued but I used to use the cegma2zff.pl steps to use those >> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >> what file to pass from BUSCO to MAKER for this to be properly utilized. I >> didn't see anything specific about this in the archives. >> ----- >> >> *Ian Misner, Ph.D.* >> >> Computational Genomics Specialist >> >> Contractor, Medical Science and Computing, Inc. >> >> Bioinformatics and Computational Biosciences Branch (BCBB) >> >> NIH/NIAID/OD/OSMO/OCICB >> >> 5601 Fishers Lane, Room 4A59 >> >> Rockville, MD 20892 >> >> Office: 301-761-6208 >> >> Mobile: 301-704-0151 >> >> Email: ian.misner at nih.gov >> >> Web: BCBB Home Page >> >> Twitter: @NIAIDBioIT >> >> >> >> Disclaimer: The information in this e-mail and any of its attachments is >> confidential and may contain sensitive information. It should not be used >> by anyone who is not the original intended recipient. If you have received >> this e-mail in error please inform the sender and delete it from your >> mailbox or any other storage devices. National Institute of Allergy and >> Infectious Diseases shall not accept liability for any statements made that >> are sender's own and not expressly made on behalf of the NIAID by one of >> its representatives. >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > -- > Xabier V?zquez-Campos, *PhD* > *Research Associate* > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdolze at students.uni-mainz.de Fri May 13 04:14:57 2016 From: fdolze at students.uni-mainz.de (Dolze, Florian) Date: Fri, 13 May 2016 12:14:57 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> On a somewhat related note, is there an advantage of using BUSCO to train Augustus instead of the provided Augustus webtraining service? Does anybody know how those 2 compare? Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and > then, in the maker_opts.ctl file, you should point the > augustus_species variable to your trained species (i.e. the name you > pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer > BUSCO versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > > wrote: > > Check this thread: > https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ > > > On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > > wrote: > > Hello, > > Are there any guidelines for using BUSCO to help train MAKER? > CEGMA has been discontinued but I used to use the cegma2zff.pl > steps to use those proteins as a > training step. BUSCO seems to train Augustus but I'm not sure > what file to pass from BUSCO to MAKER for this to be properly > utilized. I didn't see anything specific about this in the > archives. > ----- > > *Ian Misner, Ph.D.* > > Computational Genomics Specialist > > Contractor, Medical Science and Computing, Inc. > > Bioinformatics and Computational Biosciences Branch (BCBB) > > NIH/NIAID/OD/OSMO/OCICB > > 5601 Fishers Lane, Room 4A59 > > Rockville, MD 20892 > > Office: 301-761-6208 > > Mobile: 301-704-0151 > > Email: ian.misner at nih.gov > > Web: BCBB Home Page > > Twitter: @NIAIDBioIT > > > > Disclaimer: The information in this e-mail and any of its > attachments is confidential and may contain sensitive > information. It should not be used by anyone who is not the > original intended recipient. If you have received this e-mail > in error please inform the sender and delete it from your > mailbox or any other storage devices. National Institute of > Allergy and Infectious Diseases shall not accept liability for > any statements made that are sender's own and not expressly > made on behalf of the NIAID by one of its representatives. > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, /PhD/ > /Research Associate/ > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > _______________________________________________ > maker-devel mailing list > maker-devel at yandell-lab.org > http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 02:28:00 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 10:28:00 +0200 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis wrote: > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.waterhouse at gmail.com Fri May 13 06:54:40 2016 From: robert.waterhouse at gmail.com (Robert Waterhouse) Date: Fri, 13 May 2016 14:54:40 +0200 Subject: [maker-devel] BUSCO In-Reply-To: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> References: <309ba52c-a99e-9397-f6c2-485a68628571@students.uni-mainz.de> Message-ID: I would guess that the main 'advantage' of using BUSCO to train Augustus is that one will probably run BUSCO on one's genome anyway before starting MAKER, so there will already be a useful set of trained parameters ready to use. I guess the 'advantage' of using the Augustus webtraining service is that one could give it much more starting data (if indeed this is available, e.g. cDNAs). Indeed if there was enough time and it made a substantial difference one might even use the BUSCO gene model output as the 'Training gene structure file' for Augustus webtraining service. I don't believe that anyone has done a comparison on how different the trained parameters end up being. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 12:14, Dolze, Florian wrote: > > On a somewhat related note, is there an advantage of using BUSCO to train > Augustus instead of the provided Augustus webtraining service? Does anybody > know how those 2 compare? > > > > Am 13.05.2016 um 09:56 schrieb Panos Ioannidis: > > Hello Ian, > > Xabier is right. You have to run BUSCO with the --long switch and then, > in the maker_opts.ctl file, you should point the augustus_species variable > to your trained species (i.e. the name you pass with the -o/-a parameter). > > So, in Xabier's example your maker_opts.ctl file should contain the > following line: > > augustus_species=Genus_species > > Felipe, Rob, is there something else that I'm missing? Truth is that I > haven't run this recently and there might be differences in newer BUSCO > versions. > > Panos > > > Panos Ioannidis, PhD > Postdoctoral researcher > Computational Evolutionary Genomics Group > University of Geneva > > On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos < > xvazquezc at gmail.com> wrote: > >> Check this thread: >> https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ >> >> On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] < >> ian.misner at nih.gov> wrote: >> >>> Hello, >>> >>> Are there any guidelines for using BUSCO to help train MAKER? CEGMA has >>> been discontinued but I used to use the cegma2zff.pl steps to use those >>> proteins as a training step. BUSCO seems to train Augustus but I'm not sure >>> what file to pass from BUSCO to MAKER for this to be properly utilized. I >>> didn't see anything specific about this in the archives. >>> ----- >>> >>> *Ian Misner, Ph.D.* >>> >>> Computational Genomics Specialist >>> >>> Contractor, Medical Science and Computing, Inc. >>> >>> Bioinformatics and Computational Biosciences Branch (BCBB) >>> >>> NIH/NIAID/OD/OSMO/OCICB >>> >>> 5601 Fishers Lane, Room 4A59 >>> >>> Rockville, MD 20892 >>> >>> Office: 301-761-6208 >>> >>> Mobile: 301-704-0151 >>> >>> Email: ian.misner at nih.gov >>> >>> Web: BCBB Home Page >>> >>> Twitter: @NIAIDBioIT >>> >>> >>> >>> Disclaimer: The information in this e-mail and any of its attachments is >>> confidential and may contain sensitive information. It should not be used >>> by anyone who is not the original intended recipient. If you have received >>> this e-mail in error please inform the sender and delete it from your >>> mailbox or any other storage devices. National Institute of Allergy and >>> Infectious Diseases shall not accept liability for any statements made that >>> are sender's own and not expressly made on behalf of the NIAID by one of >>> its representatives. >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> >> >> -- >> Xabier V?zquez-Campos, *PhD* >> *Research Associate* >> Water Research Centre >> School of Civil and Environmental Engineering >> The University of New South Wales >> Sydney NSW 2052 AUSTRALIA >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> > > > _______________________________________________ > maker-devel mailing listmaker-devel at yandell-lab.orghttp://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 13 10:25:56 2016 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 May 2016 16:25:56 +0000 Subject: [maker-devel] BUSCO In-Reply-To: References: Message-ID: Our group have mainly used the BUSCO model in the ?bootstrap? run for MAKER, then retrain Augustus and SNAP using a filtered data set from that run for new rounds of MAKER. Also, one personal observation: we have found some genome assemblies where BUSCO performs poorly compared to CEGMA (e.g. BUSCO reports poor overall percent of SCO present, while CEGMA reports much higher numbers). We?re still delving into this, but in those cases we avoid using the BUSCO model for obvious reasons. chris On May 13, 2016, at 3:28 AM, Robert Waterhouse > wrote: I think in the Augustus 'species' directory there should be a new folder named according to your BUSCO run, and in that folder should be the trained parameters for your new species, so from MAKER I guess you can point to these trained parameters. Rob \\ Dr Robert Waterhouse O0o-- SIB ma?tre assistant "" www.rmwaterhouse.org A maturing understanding of the composition of the insect gene repertoire COIS 2015 BUSCO: assessing genome assembly and annotation completeness Bioinformatics 2015 On 13 May 2016 at 09:56, Panos Ioannidis > wrote: Hello Ian, Xabier is right. You have to run BUSCO with the --long switch and then, in the maker_opts.ctl file, you should point the augustus_species variable to your trained species (i.e. the name you pass with the -o/-a parameter). So, in Xabier's example your maker_opts.ctl file should contain the following line: augustus_species=Genus_species Felipe, Rob, is there something else that I'm missing? Truth is that I haven't run this recently and there might be differences in newer BUSCO versions. Panos Panos Ioannidis, PhD Postdoctoral researcher Computational Evolutionary Genomics Group University of Geneva On Fri, May 13, 2016 at 2:31 AM, Xabier V?zquez Campos > wrote: Check this thread: https://groups.google.com/forum/#!topic/maker-devel/vp8R06VVQGQ On 26 April 2016 at 02:20, Misner, Ian (NIH/NIAID) [C] > wrote: Hello, Are there any guidelines for using BUSCO to help train MAKER? CEGMA has been discontinued but I used to use the cegma2zff.pl steps to use those proteins as a training step. BUSCO seems to train Augustus but I'm not sure what file to pass from BUSCO to MAKER for this to be properly utilized. I didn't see anything specific about this in the archives. ----- Ian Misner, Ph.D. Computational Genomics Specialist Contractor, Medical Science and Computing, Inc. Bioinformatics and Computational Biosciences Branch (BCBB) NIH/NIAID/OD/OSMO/OCICB 5601 Fishers Lane, Room 4A59 Rockville, MD 20892 Office: 301-761-6208 Mobile: 301-704-0151 Email: ian.misner at nih.gov Web: BCBB Home Page Twitter: @NIAIDBioIT Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -- Xabier V?zquez-Campos, PhD Research Associate Water Research Centre School of Civil and Environmental Engineering The University of New South Wales Sydney NSW 2052 AUSTRALIA _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Fri May 13 09:34:00 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Fri, 13 May 2016 11:34:00 -0400 Subject: [maker-devel] Reformat maker gff3 In-Reply-To: References: <1460516670248.1644@uq.edu.au> Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A@gmail.com> I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. Mike > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos wrote: > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > On 13 April 2016 at 13:05, Jenny Lee > wrote: > Hi all, > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > I've been trying to do this manually and it gets messy. Any ideas? > > Thanks a lot. > > Regards, > Jenny Lee > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > -- > Xabier V?zquez-Campos, PhD > Research Associate > Water Research Centre > School of Civil and Environmental Engineering > The University of New South Wales > Sydney NSW 2052 AUSTRALIA > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bi0411 (1).pdf Type: application/pdf Size: 484329 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:32:40 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:32:40 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). ?Carson > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. > > Ray > > On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: > Hi. > > Segmentation fault of MKAER with openmpi on CentOS 7.2. > Both MAKER 2.31.8 and 3.00.0 beta have the same error. > > $ mpirun -mca btl ^openib -n 4 maker > STATUS: Parsing control files... > STATUS: Processing and indexing input FASTA files... > -------------------------------------------------------------------------- > mpirun noticed that process rank 2 with PID 39507 on node T620 exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > $ file core.39505 > core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), > SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make > $ gdb /usr/bin/perl core.39505 > (gdb) where > #0 0x00007f0e4a7d2060 in ?? () > #1 > #2 0x00007f0e4a7d2060 in ?? () > #3 > #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from > /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so > #5 0x00007f0e63ec8eda in opal_progress () from > /usr/lib64/openmpi/lib/libopen-pal.so.13 > #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from > /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so > #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so > #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, > source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 > #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv > (my_perl=, cv=) at MPI.c:391 > #10 0x00007f0e657ce39f in Perl_pp_entersub () from > /usr/lib64/perl5/CORE/libperl.so > #11 0x00007f0e657c6b16 in Perl_runops_standard () from > /usr/lib64/perl5/CORE/libperl.so > #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so > #13 0x0000000000400d99 in main () > $ echo $LD_PRELOAD > /usr/lib64/openmpi/lib/libmpi.so: > $ echo $OMPI_MCA_mpi_warn_on_fork > 0 > $ rpm -qa openmpi > openmpi-1.10.0-10.el7.x86_64 > $ uname -a > Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > $ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1029973 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 102400 > cpu time (seconds, -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > $ mpiexec --version > mpiexec (OpenRTE) 1.10.0 > > Report bugs to http://www.open-mpi.org/community/help/ > $ > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.p.price at gmail.com Fri May 13 10:35:21 2016 From: dave.p.price at gmail.com (David Price) Date: Fri, 13 May 2016 10:35:21 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: would it be possible to get digest mode set up properly? I have it selected but I get emails for each individual message. Thanks On Fri, May 13, 2016 at 10:27 AM, wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > To: Xabier V?zquez Campos > Cc: Jenny Lee , "maker-devel at yandell-lab.org" > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com> > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to > do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need > to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee h.lee12 at uq.edu.au>> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've > decided to keep - all maker genes, a subset of abinitio genes (which have > interproscan hits). I would like to also exclude the repeats information > and only retain the CDS, gene, exon and mRNA - like the format we usually > see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment.pdf > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://box290.bluehost.com/pipermail/maker-devel_yandell-lab.org/attachments/20160513/f0f3e46b/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 10:46:38 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 10:46:38 -0600 Subject: [maker-devel] maker-devel Digest, Vol 96, Issue 10 In-Reply-To: References: Message-ID: <9ABEE8DB-6316-4CF1-BC46-0DB2C188BC44@gmail.com> I toggled off and back on your digest option incase that is the issue. The Mailman docs say that on busy days the digest option may decide to send out more than one digest, so that could be the issue too. The company providing out mail list was having issues the last few weeks, so we weren?t able to approve most posts until yesterday. As a result, there was an explosion of approved posts that may have triggered the digest to be more than 1 per day yesterday and today. ?Carson > On May 13, 2016, at 10:35 AM, David Price wrote: > > would it be possible to get digest mode set up properly? > I have it selected but I get emails for each individual message. > > Thanks > > On Fri, May 13, 2016 at 10:27 AM, > wrote: > Send maker-devel mailing list submissions to > maker-devel at yandell-lab.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > or, via email, send a message with subject or body 'help' to > maker-devel-request at yandell-lab.org > > You can reach the person managing the list at > maker-devel-owner at yandell-lab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of maker-devel digest..." > > > Today's Topics: > > 1. Re: Reformat maker gff3 (Michael Campbell) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 13 May 2016 11:34:00 -0400 > From: Michael Campbell > > To: Xabier V?zquez Campos > > Cc: Jenny Lee >, "maker-devel at yandell-lab.org " > > > Subject: Re: [maker-devel] Reformat maker gff3 > Message-ID: <777D8DFF-CB99-4F03-A4CF-8E52F0E4526A at gmail.com > > Content-Type: text/plain; charset="utf-8" > > I?ve attached a protocols paper that walks through what you are trying to do. Let me know if it helps. > Mike > > > On May 12, 2016, at 8:37 PM, Xabier V?zquez Campos > wrote: > > > > Can't you filter the file content with the 'grep' command? If you need to extract columns, use 'cut' too > > > > On 13 April 2016 at 13:05, Jenny Lee >> wrote: > > Hi all, > > > > I would like to update my maker gff3 file to only contain the genes I've decided to keep - all maker genes, a subset of abinitio genes (which have interproscan hits). I would like to also exclude the repeats information and only retain the CDS, gene, exon and mRNA - like the format we usually see in published data. > > > > I've been trying to do this manually and it gets messy. Any ideas? > > > > Thanks a lot. > > > > Regards, > > Jenny Lee > > > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > > > > > > > > -- > > Xabier V?zquez-Campos, PhD > > Research Associate > > Water Research Centre > > School of Civil and Environmental Engineering > > The University of New South Wales > > Sydney NSW 2052 AUSTRALIA > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: bi0411 (1).pdf > Type: application/pdf > Size: 484328 bytes > Desc: not available > URL: > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > ------------------------------ > > End of maker-devel Digest, Vol 96, Issue 10 > ******************************************* > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From platycerus at gmail.com Fri May 13 11:08:19 2016 From: platycerus at gmail.com (Ray Cui) Date: Fri, 13 May 2016 19:08:19 +0200 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: Hello, I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). It works fine with MpiCH so I stopped looking into this. Ray On Fri, May 13, 2016 at 6:32 PM, Carson Holt wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the > ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on > CentOS systems. To get around that issue, compile OpenMPI with gcc instead > of the intel compiler, or alternatively manually install a separate perl > installation without pthread support (i.e. pthreads disabled during the > configure step). > > ?Carson > > > > On May 12, 2016, at 3:49 PM, Ray Cui wrote: > > Dear Yugui > > I had the same problem with openmpi. I think it is not compatible with > Maker. I now use mpich, which works. > > Ray > On May 12, 2016 11:32 PM, "Yugui Wang" wrote: > >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from >> /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from >> /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 13 11:16:49 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 13 May 2016 11:16:49 -0600 Subject: [maker-devel] Segmentation fault of MKAER with openmpi on CentOS 7.2 In-Reply-To: References: Message-ID: It?s possible it was set wrong as there may be more than one libmpi.so on the system. It also has to be set before compiling and every time you run. The next issue is that some systems (like ubuntu) will often have extra mpicc, libmpi.so, and mpiexec files that don?t match the OpenMPI you are trying to use. Tracking down those mismatches before compiling and ensuring that they don?t revert with your bashrc/bash_profile can be complicated. In these cases you may also have to additionally specify LD_PRELOAD with the -x parameter of the OpenMPI mpiexec command. You often have to specify the ?-mca btl? parameter explained in the INSTALL file as well. ?Carson > On May 13, 2016, at 11:08 AM, Ray Cui wrote: > > Hello, > > I had segfaults even if I set LD_PRELOAD and used gcc for OpenMPI (dealing with Maker 3 beta though). > It works fine with MpiCH so I stopped looking into this. > > Ray > > On Fri, May 13, 2016 at 6:32 PM, Carson Holt > wrote: > With OpenMPI, you must set LD_PRELOAD for libmpi.so and sometimes the ?-mca btl paramter?. Details can be found in the ?/maker/INSTALL file. > > Also we have found a recent issue with maker and intel compiled OpenMPI on CentOS systems. To get around that issue, compile OpenMPI with gcc instead of the intel compiler, or alternatively manually install a separate perl installation without pthread support (i.e. pthreads disabled during the configure step). > > ?Carson > > > >> On May 12, 2016, at 3:49 PM, Ray Cui > wrote: >> >> Dear Yugui >> >> I had the same problem with openmpi. I think it is not compatible with Maker. I now use mpich, which works. >> >> Ray >> >> On May 12, 2016 11:32 PM, "Yugui Wang" > wrote: >> Hi. >> >> Segmentation fault of MKAER with openmpi on CentOS 7.2. >> Both MAKER 2.31.8 and 3.00.0 beta have the same error. >> >> $ mpirun -mca btl ^openib -n 4 maker >> STATUS: Parsing control files... >> STATUS: Processing and indexing input FASTA files... >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 2 with PID 39507 on node T620 exited >> on signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> $ file core.39505 >> core.39505: ELF 64-bit LSB core file x86-64, version 1 (SYSV), >> SVR4-style, from '/usr/bin/perl /bio/hpc-bio/maker-3.00.0/bin/make >> $ gdb /usr/bin/perl core.39505 >> (gdb) where >> #0 0x00007f0e4a7d2060 in ?? () >> #1 >> #2 0x00007f0e4a7d2060 in ?? () >> #3 >> #4 0x00007f0e4bdfba50 in mca_btl_vader_component_progress () from >> /usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so >> #5 0x00007f0e63ec8eda in opal_progress () from >> /usr/lib64/openmpi/lib/libopen-pal.so.13 >> #6 0x00007f0e4a191ac5 in mca_pml_ob1_probe () from >> /usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so >> #7 0x00007f0e65b0dc06 in PMPI_Probe () from /usr/lib64/openmpi/lib/libmpi.so >> #8 0x00007f0e59007020 in C_MPI_Recv (buf=buf at entry=0x4146b30, >> source=source at entry=-1, tag=tag at entry=1111) at MPI.xs:56 >> #9 0x00007f0e590071e3 in XS_Parallel__Application__MPI_C_MPI_Recv >> (my_perl=, cv=) at MPI.c:391 >> #10 0x00007f0e657ce39f in Perl_pp_entersub () from >> /usr/lib64/perl5/CORE/libperl.so >> #11 0x00007f0e657c6b16 in Perl_runops_standard () from >> /usr/lib64/perl5/CORE/libperl.so >> #12 0x00007f0e65763925 in perl_run () from /usr/lib64/perl5/CORE/libperl.so >> #13 0x0000000000400d99 in main () >> $ echo $LD_PRELOAD >> /usr/lib64/openmpi/lib/libmpi.so: >> $ echo $OMPI_MCA_mpi_warn_on_fork >> 0 >> $ rpm -qa openmpi >> openmpi-1.10.0-10.el7.x86_64 >> $ uname -a >> Linux T620 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> $ ulimit -a >> core file size (blocks, -c) unlimited >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 1029973 >> max locked memory (kbytes, -l) 64 >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size (512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 102400 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 4096 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> $ mpiexec --version >> mpiexec (OpenRTE) 1.10.0 >> >> Report bugs to http://www.open-mpi.org/community/help/ >> $ >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bmoore at genetics.utah.edu Sun May 15 18:57:37 2016 From: bmoore at genetics.utah.edu (Barry Moore) Date: Mon, 16 May 2016 00:57:37 +0000 Subject: [maker-devel] Fwd: Issue with make, no prediction after gff3_merge References: Message-ID: <2B3935BF-1995-4250-8694-92FA0C36A729@genetics.utah.edu> Hi David, First of all apologies for the delay in addressing your e-mail, our mailing list software (provided by an external ISP) has stopped supporting the MailMan software that is running the maker-devel list and the software has been unresponsive to our attempts to add new users or moderate messages. We will handle this message directly through e-mail for now. We have requested a new maker mailing list through our University IT department and that is request pending approval. The new mailing list should get our experience should get our user support back to normal very soon. Can you share a few lines of the GFF files that you passed to est_gff? Thanks Barry Begin forwarded message: From: "LOPEZ, David" > Subject: Issue with make, no prediction after gff3_merge Date: May 3, 2016 at 1:27:03 AM PDT To: "maker-devel-owner at yandell-lab.org" > Dear all, I am still waiting for my registration at maker-devell list hence I send my question as a mail but I will transfer it to the discussion group when possible. I am a commercial licenced user of Maker and I currently currently face some issues running Maker3.00.0 on a PBS cluster with an openMPI 1.10.2 implementation (Which runs great most of the time, but that is not the issue discussed here). After successfully testing the datatset provided in the package (dpp and pyu) I moved to my own assembly (140 000 scaffolds ~ 14GB, eukaryotic, premasked) I have already made some rnaseq mappings (gff) as well as CDNA and Proteome from reference genome. To me it appears that only fasta evidences are used but not the gff when I look at the predictions in IGV. In the gff from gff3_merge, I have blastx protein2genome and maker predictions as well as est_gff:stringtie but no est_gff:somethingelse from my CDNA and EST fed to maker. Another issue, potentially linked to this problem is that I wasn?t able to use tags in my gff evidences: maker fails to run telling that: mygff3evidencefile.gff:mylabel was not found which means it doesn?t interpret right the semicolum. I have attached my maker opts files. Thanks by advance for your help Best regards, David. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_evm.ctl Type: application/octet-stream Size: 911 bytes Desc: maker_evm.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_exe.ctl Type: application/octet-stream Size: 1601 bytes Desc: maker_exe.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_opts.ctl Type: application/octet-stream Size: 5236 bytes Desc: maker_opts.ctl URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: maker_bopts.ctl Type: application/octet-stream Size: 1508 bytes Desc: maker_bopts.ctl URL: From clements at galaxyproject.org Mon May 16 11:20:35 2016 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 16 May 2016 10:20:35 -0700 Subject: [maker-devel] GMOD 2016 Meeting early reg ends May 21; Galaxy Conference Deadlines Message-ID: Hello all, *GMOD will be holding a community meeting on June 30th and July 1st in Bloomington, Indiana, United States.* GMOD Meetings are a mix of user and developer presentations, and are a great place to find out what is happening in the project, what's coming up, and what others are doing. *Early bird registration ends May 21, this Saturday.* *For those who would like to present a talk or poster, the meeting registration form includes a section for submitting the presentation title and abstract.* If you have any suggestions or requests for the meeting, please contact the GMOD help desk . *GCC2016* The GMOD Meeting is immediately after the 2016 Galaxy Community Conference (GCC2016) , also in Bloomington (and sharing housing and venue). If you are interested in Galaxy, *GCC2016 has a number of deadlines this Friday, May 20*. See below. Galaxy is a part of the GMOD project and there are several presentations at GCC2016 that cover the GMOD integration: - Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform, talk presented by Margaret Staton - Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects , talk presented by Nathan Dunn (Apollo will have a poster and demo) - Hardwood Genomics Database (HGD): a web portal and database resource for hardwood tree genomic and genetic research, poster presented by Ming Chen and Margaret Staton (posters are not online yet) More posters and demos are in the works. Thanks, and hope to see you in Bloomington, Dave C ---------- Forwarded message ---------- From: Dave Clements Date: Mon, May 16, 2016 at 9:09 AM Subject: GCC2016 Deadlines this Friday & Conference schedule To: Galaxy Announcements List , Galaxy Dev List Hello all, This is just a reminder that* there are some key deadlines this Friday, May 20:* - Early registration ends . After Friday registration rates go up by over 40%. - Poster abstracts are due. - Demo abstracts are due. These are new this year and can complement a poster abstract or stand on their own. If you are wondering what's happening at GCC2016, the training and conference schedules are now online, featuring 21 accepted talks and 31 training sessions . And, thanks to Jetstream IU's newest National Science Foundation-funded project (and in which Galaxy is a partner), and the National Center for Genome Analysis Support at IU are sponsoring an opening reception on Monday evening at the IU Cyberinfrastructure Building. The first ever GCC opening reception will feature local wine/beer, morsels from local eateries, and demonstrations of the 15 million+ pixel IQ-Wall, IU's Data Center, Science on a Sphere, and other IU-centric IT. Hope to see you there, Dave C -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ https://wiki.galaxyproject.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From maker-devel at yandell-lab.org Mon May 16 19:30:23 2016 From: maker-devel at yandell-lab.org (maker-devel at yandell-lab.org) Date: Tue, 17 May 2016 08:30:23 +0700 Subject: [maker-devel] Your .pdf document is attached Message-ID: <201605171983886.0A606AA3@m6888933.yandell-lab.org> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 21010F.zip Type: application/x-zip-compressed Size: 2960 bytes Desc: not available URL: From carson.holt at genetics.utah.edu Wed May 18 12:33:39 2016 From: carson.holt at genetics.utah.edu (Carson Holt) Date: Wed, 18 May 2016 18:33:39 +0000 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: <573DB6D8-E773-45F5-9F53-DCAA20913EFF@genetics.utah.edu> Yes. Use top to check cpu usage. If it?s not 100% for the machine (or 6400% for all processes - 64 cpus * 100%), then we can look if you are launching the command correctly or have other issues. ?Carson On May 18, 2016, at 7:16 AM, Michael Campbell > wrote: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike On May 17, 2016, at 10:10 PM, Pei-Ying Huang > wrote: Hi mike, My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. Then autotrain with augustus, here is my command autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log But I saw one's method below, so I wonder if I am doing wrong? "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: 1. EST data 2. Proteins from related species 3. a SNAP model trained using CEGMA 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) 5. Running maker2zff on the output of MAKER, and converting that to GFF3 Once done, we run MAKER a second time using the Augustus model and more stringent settings." Thank you. Pei-Ying 2016-05-18 9:16 GMT+08:00 Michael Campbell >: Hi Pei-Ying, One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. I would only redo the sequencing if you are convinced that the original sequencing is bad. Mike On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: Hi mike, As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? Thank you. Pei-Ying 2016-05-09 22:18 GMT+08:00 Michael Campbell >: I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. Thanks, Mike On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: Have you done all of the test? What would you suggest me to run my data? To get ab initio model by setting the est2genome =1 and protein2genome = 0, then training with sanp model with est2genome = 0 and protein2genome = 0, training second snap model with est2genome = 0 and protein2genome = 0. Thank you. 2016-05-07 0:30 GMT+08:00 Michael Campbell >: So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. Mike On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: Hi Mike, I found one five_prime_UTP evidence, but only this one shown in the scaff0001. Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? Thank you. GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 Pei-Ying 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: Hi Mike, Any clue about the problems? Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. Thank you. Pei-Ying 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: Hi Mike, Attached file is the folder I use to run maker. Thank you. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] guliRN_L1_v1_mike.tar.gz[X] ? Pei-Ying 2016-05-04 22:54 GMT+08:00 Michael Campbell >: Hi Pei-Ying, If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. your maker_opts.ctl file looks fine. If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. Thanks, Mike On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: Hi Mike, basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. ? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker1.log[X] ?? [https://ssl.gstatic.com/docs/doclist/images/icon_10_generic_list.png] maker_opts.log[X] ? less A_guli_1.all.gff GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 Thank you. Pei-Ying 2016-04-14 21:09 GMT+08:00 Michael Campbell >: It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. The attached protocol paper also addresses your quality question to an extent. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.campbell1 at gmail.com Wed May 18 07:16:05 2016 From: michael.s.campbell1 at gmail.com (Michael Campbell) Date: Wed, 18 May 2016 09:16:05 -0400 Subject: [maker-devel] [maker] transcripts doesn't provide any help In-Reply-To: References: <4B761165-F33F-4BCA-8DE0-B2AF0A6AE771@gmail.com> <3D78FFC2-B0FE-4DCC-A079-0B99CFB6C735@gmail.com> <44AADB65-67F1-4D25-A7C6-C7EE93B9E80B@gmail.com> <43234558-4A22-4B90-A209-CA7FEEF230CF@gmail.com> <9B9328D2-8F16-47C1-8873-F7821637E7FB@gmail.com> <4ACF958F-C6DF-469A-81CE-BF5854D7B8A2@gmail.com> Message-ID: Hi Pei-Ying, The time it takes to run MAKER is a hard to guess because it is dependent on the size of the genome and the amount of evidence you give it. However, There may be more going on. Can you tell if MAKER is using all of the cores that you gave it? For training augustus, there are several options. Using the CEGMA output is a common method. Given that your genome is a 4G plant genome I don?t think GeneMark will perform well. If you used the step you mentioned below but left GeneMark out you may get a better training than you would with CEGMA output alone. I?ve ccd Carson Holt, he has much more experience with the MPI aspects of MAKER and may have some additional insights. I?m also ccing the devlist. There may be others in the community that can comment on the run times. Thanks, Mike > On May 17, 2016, at 10:10 PM, Pei-Ying Huang wrote: > > Hi mike, > > My plant genome is about 4Gb, 93789 scaffolds. When I run maker using MPI on a server with 64 cores, only 1% of genome is annotated. > Is it the normal condition? Since I read a post said that it takes about 6 days on 16 processor to finish one round on a ~150,000 scaffold ~2Gb vertebrate genome with protein evidence. > Then based on the post, I expect I get the result no more than two weeks. However, it seems it will take me more than three months. > > Also I want to get a training set parameter by augustus, now I use CEGMA to produce a .gff file, then convert it to augustus.gff by cegma2gff. > Then autotrain with augustus, here is my command > autoAugTrain.pl --genome=GULI.genome.removeAllN.fa --trainingset=augustus.gff --species=A_autoAugTrain_1 &> log > > > But I saw one's method below, so I wonder if I am doing wrong? > > "We get the genome.gff3 training set from the output of a first-pass run of MAKER using: > 1. EST data > 2. Proteins from related species > 3. a SNAP model trained using CEGMA > 4. a GeneMark model (obtained by running GeneMark.ES on the draft genome) > 5. Running maker2zff on the output of MAKER, and converting that to GFF3 > Once done, we run MAKER a second time using the Augustus model and more stringent settings." > > Thank you. > Pei-Ying > > > > > > 2016-05-18 9:16 GMT+08:00 Michael Campbell >: > Hi Pei-Ying, > > One of the first places to start with RNA-seq quality control is using a tool called fastqc it will produce a number of graphics that can help identify problematic files. There are a number of tools for quality trimming reads, timmomatic and fastx tools are popular ones. > > I would only redo the sequencing if you are convinced that the original sequencing is bad. > > Mike > > >> On May 16, 2016, at 8:42 PM, Pei-Ying Huang > wrote: >> >> Hi mike, >> >> As you said the reason I only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> If the problem is due to RNA-seq data quality, how could I identify the RNA-seq data with bad quality and trim them out? >> If the problem is due to expression profiles of the tissues used for mRNA-seq, should we try to extract RNA from the plant again and redo the sequencing? >> Thank you. >> >> Pei-Ying >> >> 2016-05-09 22:18 GMT+08:00 Michael Campbell >: >> I did finish running the test I planned. What I noticed is that there is protein evidence for about 1,000 genes on that scaffold and transcript evidence for only one gene. The reason you only get one gene with the transcript evidence is independent of MAKER and could be RNA-seq data quality or the expression profiles of the tissues used for mRNA-seq. >> >> What you described is what I would do. Followed by training augustus. Unless est2genome=1 and prtein2genome=0 doesn?t generate enough gene models to train the gene finders. Then I would set est2genome=1 and protein2genome=1 for the first round instead. >> >> Thanks, >> Mike >>> On May 8, 2016, at 10:08 AM, Pei-Ying Huang > wrote: >>> >>> Have you done all of the test? >>> What would you suggest me to run my data? >>> >>> To get ab initio model by setting the est2genome =1 and protein2genome = 0, >>> then training with sanp model with est2genome = 0 and protein2genome = 0, >>> training second snap model with est2genome = 0 and protein2genome = 0. >>> >>> Thank you. >>> >>> 2016-05-07 0:30 GMT+08:00 Michael Campbell >: >>> So far in the tests that I?ve done I get the same first exon as 5 prime UTR and part of the last exon in 3 prime UTR for that gene. >>> Mike >>>> On May 5, 2016, at 10:18 PM, Pei-Ying Huang > wrote: >>>> >>>> Hi Mike, >>>> >>>> I found one five_prime_UTP evidence, but only this one shown in the scaff0001. >>>> Does it mean no more five_prime_UTP on this scaffold or maker doesn't find others? >>>> Thank you. >>>> >>>> GULI.scaff0001 maker gene 3190189 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426 >>>> GULI.scaff0001 maker mRNA 3190189 3192302 1262 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1;_AED=0.27;_eAED=0.27;_QI=335|0.83|0.71|1|0|0|7|0|308 >>>> GULI.scaff0001 maker exon 3190189 3190216 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:6;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190331 3190656 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:5;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3190818 3190955 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:4;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191233 3191510 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:3;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191634 3191666 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:2;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191755 3191848 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker exon 3191938 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:exon:0;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker five_prime_UTR 3191968 3192302 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:five_prime_utr;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191938 3191967 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191755 3191848 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191634 3191666 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3191233 3191510 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190818 3190955 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190331 3190656 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> GULI.scaff0001 maker CDS 3190189 3190216 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-30.426-mRNA-1 >>>> >>>> Pei-Ying >>>> >>>> 2016-05-06 8:31 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Any clue about the problems? >>>> Or my thought is wrong. I judge the transcript data help or not in maker by checking if est2genome shown in the column 2 in maker output gff file. >>>> Thank you. >>>> >>>> Pei-Ying >>>> >>>> >>>> 2016-05-05 1:22 GMT+08:00 Pei-Ying Huang >: >>>> Hi Mike, >>>> >>>> Attached file is the folder I use to run maker. Thank you. >>>> ? >>>> ?guliRN_L1_v1_mike.tar.gz ? >>>> Pei-Ying >>>> >>>> 2016-05-04 22:54 GMT+08:00 Michael Campbell >: >>>> Hi Pei-Ying, >>>> >>>> If the sample data didn?t produce est2genome lines when using the sample data then it may be that exonerate is not being called. Could you send me the maker_exe.ctl file. >>>> >>>> your maker_opts.ctl file looks fine. >>>> >>>> If you have a small test set for your data like a small scaffold that you know has some sringtie hits on it, you could send it to me if you want and I can see if I can figure it out form here if that would be helpful. >>>> >>>> Thanks, >>>> Mike >>>>> On May 4, 2016, at 12:33 AM, Pei-Ying Huang > wrote: >>>>> >>>>> Hi Mike, >>>>> >>>>> basic_protocol_1.tar.gz: I run the sample data by Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> I still can't find est2genome in column 2 of gff file and no five_prime_UTR or three_prime_UTR in column 3. >>>>> I use StringTie to align pair-end reads to genome then use cufflinks2gff to generate the .gff file for maker input. >>>>> Since I have three conditions (root, stem, leaf), so I got Root_strtie.gff,Stem_strtie.gff, R_strtie.gff as maker inputs. >>>>> >>>>> Should I merge Root_strtie.gff,Stem_strtie.gff, R_strtie.gff to strtie_merge.gff before input to maker? >>>>> When I try to use cufflinks to convert strtie_merge.gtf to strtie_merge.gff, shows the error message below. >>>>> >>>>> /home/pyh/bin/maker/bin/cufflinks2gff3 strtie_merge.gtf > strtie_merge.gff >>>>> >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221531. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221532. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221533. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221534. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221535. >>>>> Use of uninitialized value $score in join or string at /home/pyh/bin/maker/bin/cufflinks2gff3 line 94, line 221536. >>>>> ? >>>>> ?maker1.log ?? >>>>> ?maker_opts.log ? >>>>> less A_guli_1.all.gff >>>>> GULI.scaff0001 maker gene 1750118 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37 >>>>> GULI.scaff0001 maker mRNA 1750118 1755997 5292 - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37;Name=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1;_AED=0.37;_eAED=0.37;_QI=0|0|0|1|0|0|7|0|1764 >>>>> GULI.scaff0001 maker exon 1750118 1750214 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:21;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750304 1750815 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:20;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1750896 1751717 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:19;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1751849 1752373 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:18;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1752515 1753488 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:17;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1753554 1754406 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:16;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker exon 1754489 1755997 . - . ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:exon:15;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1754489 1755997 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1753554 1754406 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1752515 1753488 . - 2 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1751849 1752373 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750896 1751717 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750304 1750815 . - 0 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> GULI.scaff0001 maker CDS 1750118 1750214 . - 1 ID=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1:cds;Parent=maker-GULI.scaff0001-exonerate_protein2genome-gene-17.37-mRNA-1 >>>>> >>>>> Thank you. >>>>> Pei-Ying >>>>> >>>>> >>>>> >>>>> >>>>> 2016-04-14 21:09 GMT+08:00 Michael Campbell >: >>>>> It is strange for transcripts from the species of interest to not align or help. That FASTA entry looks okay. Did you save the error output from MAKER? if you did could you send it to me along with the MAKER control files? There may be some clues in there. >>>>> >>>>> It would also be good if you could run MAKER on the sample data from drosophila in the /data folder in MAKER. This way we can see if it is your data or your install of MAKER. Basic protocol 1 in the attached protocol paper uses the drosophila data bundled with MAKER. >>>>> >>>>> Aligning with hisat2 and using cufflinks to make transcripts should work. Stringtie seems to have higher specificity than cufflinks and the cufflinks2gff script works on stringtie output as well. You could also do a denovo assembly of the reads yourself using trinity, which has worked well for me in the past. >>>>> >>>>> Protein evidence only will give a reasonable annotation. The transcript data will help in annotating UTRs and species specific genes. >>>>> >>>>> The attached protocol paper also addresses your quality question to an extent. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:08:52 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:08:52 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: References: Message-ID: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. ?Carson > On May 24, 2016, at 10:58 AM, Won C Yim wrote: > > Dear MAKER team, > > We have been using MAKER to generate our plant genome annotations. > > Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. > > Is there any way to discard single exon genes? > > Regards, > > Won > > -- > Yim, Won Cheol > MS330/Department of Biochemistry & Molecular Biology > 1664 N. Virginia Street > University of Nevada, Reno > > email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From munholl at uwindsor.ca Tue May 24 11:11:55 2016 From: munholl at uwindsor.ca (Seth Munholland) Date: Tue, 24 May 2016 13:11:55 -0400 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Hi Carson, Just an update, that was indeed my issue. Thanks for your help! Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Wed, May 11, 2016 at 11:35 AM, Seth Munholland wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ > install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Tue, May 10, 2016 at 7:02 PM, Carson Holt wrote: > >> So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t >> give developers that kind of low level access to memory). However if you >> are using MPI, then it could be causing a seg fault, or one of the programs >> MAKER is calling could be seg faulting (like BLAST). >> >> So if you are using MPI, let me know which flavor and I can make >> suggestions (for example MVAPICH2 is incompatible with programs that do >> system calls, and OpenMPI may require special setting for LD_PRELOAD to >> work properly with shared libraries). If your not using MPI, then you will >> need to look at the installed programs MAKER is calling and reinstall them, >> update them, or roll back a version (i.e. BLAST, Exonerate, etc.) >> >> ?Carson >> >> >> >> On May 10, 2016, at 12:18 PM, Seth Munholland >> wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg >> faulting. I've changed the the dataset I'm running MAKER on, by parsing >> out smaller sections of the larger assembly, and I still seg fault on >> sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have >> jsut finished a tblastx. Any suggestions for how I can debug and correct >> this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:12:42 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:12:42 -0600 Subject: [maker-devel] MAKER seg faulting In-Reply-To: References: <68E5831C-37AA-4DBB-9604-EE3F09FD4B39@gmail.com> Message-ID: Great to know it?s working for you now. ?Carson > On May 24, 2016, at 11:11 AM, Seth Munholland wrote: > > Hi Carson, > > Just an update, that was indeed my issue. Thanks for your help! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Wed, May 11, 2016 at 11:35 AM, Seth Munholland > wrote: > Hi Carson, > > I am not using an MPI. Given the association to tblastx I suspect my c++ install of BLAST is what's seg faulting. Thanks! > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Tue, May 10, 2016 at 7:02 PM, Carson Holt > wrote: > So MAKER is written in Perl, and Perl can?t really seg fault (it doesn?t give developers that kind of low level access to memory). However if you are using MPI, then it could be causing a seg fault, or one of the programs MAKER is calling could be seg faulting (like BLAST). > > So if you are using MPI, let me know which flavor and I can make suggestions (for example MVAPICH2 is incompatible with programs that do system calls, and OpenMPI may require special setting for LD_PRELOAD to work properly with shared libraries). If your not using MPI, then you will need to look at the installed programs MAKER is calling and reinstall them, update them, or roll back a version (i.e. BLAST, Exonerate, etc.) > > ?Carson > > > >> On May 10, 2016, at 12:18 PM, Seth Munholland > wrote: >> >> Hello Everyone, >> >> For reasons unknown my MAKER (2.31.8 on Ubuntu 14.04) runs keep seg faulting. I've changed the the dataset I'm running MAKER on, by parsing out smaller sections of the larger assembly, and I still seg fault on sections that the larger assembly moved past without issue. >> >> The only commonality I see is every tme it seg faults it appears to have jsut finished a tblastx. Any suggestions for how I can debug and correct this issue? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <>_______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 24 11:14:56 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 24 May 2016 11:14:56 -0600 Subject: [maker-devel] Single exon in GFF file. In-Reply-To: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> References: <01B068D1-A9C4-4B69-A6C5-AC06A1534846@gmail.com> Message-ID: <5DEE67B4-F022-479E-A5C5-97F76FD6601D@gmail.com> As a side note. Many of the newer plant genomes I?ve worked on have had entire yeast and bacterial genome sequenced into their assemblies (as their own separate contigs even). It is a common issue that is easily identified by just looking at a few of the more gene dense contigs in a browser like apollo. ?Carson > On May 24, 2016, at 11:08 AM, Carson Holt wrote: > > Single_exon=0 does not mean not to call single exon genes. It means not to use single exon ESTs as evidence support (as issues related to single exon ESTs are well known, so it is best to exclude them). You will still get single exon genes from the predictors and single exon protein alignments from your protein evidence. Every genome is expected to contain a number of single exon genes (the most conserved genes across species in fact tend to be single exon - there is evolutionary selection that favors single exon structure in essential genes). > > What you will want to do is look at your contigs in a browser. Depending on the structure of the genes you see and the genes around them, you may conclude that you have insufficient repeat masking (results in repeats being called as genes). Or you may realize that the contigs in question are prokaryotic (i.e. assembly contamination), which must be resolved upstream of MAKER. Or they are real genes. Remember every genome is expected contain single exon genes. > > ?Carson > > > >> On May 24, 2016, at 10:58 AM, Won C Yim > wrote: >> >> Dear MAKER team, >> >> We have been using MAKER to generate our plant genome annotations. >> >> Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. >> >> Is there any way to discard single exon genes? >> >> Regards, >> >> Won >> >> -- >> Yim, Won Cheol >> MS330/Department of Biochemistry & Molecular Biology >> 1664 N. Virginia Street >> University of Nevada, Reno >> >> email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wyim at unr.edu Tue May 24 10:58:41 2016 From: wyim at unr.edu (Won C Yim) Date: Tue, 24 May 2016 16:58:41 +0000 Subject: [maker-devel] Single exon in GFF file. Message-ID: Dear MAKER team, We have been using MAKER to generate our plant genome annotations. Even though I set the ?single_exon=0?, there are a lot of single exon gene based on Eval 2.2.8. Is there any way to discard single exon genes? Regards, Won -- Yim, Won Cheol MS330/Department of Biochemistry & Molecular Biology 1664 N. Virginia Street University of Nevada, Reno email: wyim at unr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From debarryj at gmail.com Thu May 26 13:54:37 2016 From: debarryj at gmail.com (Jeremy DeBarry) Date: Thu, 26 May 2016 12:54:37 -0700 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon Message-ID: Greetings, My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. Do you have any insight? Much appreciated, Jeremy -- Dr. Jeremy DeBarry PhD MaHPIC Data Coordinator Kissinger Research Group The University of Georgia ::: Email: debarryj at gmail.com Tel: +1.912.269.0484 Skype ID: jdebarry ::: Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 27 08:15:47 2016 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 27 May 2016 08:15:47 -0600 Subject: [maker-devel] MAKER (v2.31.6) incorrect strand for tRNA-scan (v.1.3.1) predicted exon In-Reply-To: References: Message-ID: Make sure your using the current version of MAKER. That same thread mentions it was fixed once they updated from 2.31.3 Current version is 2.31.8. ?Carson > On May 26, 2016, at 1:54 PM, Jeremy DeBarry wrote: > > Greetings, > My group has run MAKER on a small genome. One of the annotated tRNAs has an intron. The two exons are annotated on different strands. The gene and first exon are on the + strand and the second exon is on the - strand. > > I looked over the archives and found previous reports , but it appears they apply to earlier versions of MAKER. > > My instinct is to 'manually' correct the strand information for the - strand exon, but I wanted to investigate the issue further first. > > Do you have any insight? > > Much appreciated, > Jeremy > > -- > Dr. Jeremy DeBarry PhD > MaHPIC Data Coordinator > Kissinger Research Group > > The University of Georgia > ::: > Email: debarryj at gmail.com > Tel: +1.912.269.0484 > Skype ID: jdebarry > ::: > Nihil Sine Labore!:::Nec Aspera Terrent!:::Boutez-en-Avant! > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From philipp.bayer at uwa.edu.au Tue May 31 00:37:46 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Tue, 31 May 2016 14:37:46 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations Message-ID: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Hello, I have a minor question about the way MAKER joins annotations from different chunks when using MPI. Let's say I have a longer gene that bridges two chunks, so the jobs annotating both chunks separately would return two incomplete genes, one without a stop codon, one without a start codon. I assume MAKER would then join those two into a single gene, right? Is this behaviour influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? Thank you Philipp Bayer From carsonhh at gmail.com Tue May 31 09:51:34 2016 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 31 May 2016 09:51:34 -0600 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> Message-ID: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. ?Carson > On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: > > Hello, > > I have a minor question about the way MAKER joins annotations from > different chunks when using MPI. > > Let's say I have a longer gene that bridges two chunks, so the jobs > annotating both chunks separately would return two incomplete genes, one > without a stop codon, one without a start codon. I assume MAKER would > then join those two into a single gene, right? Is this behaviour > influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? > > Thank you > > Philipp Bayer > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From philipp.bayer at uwa.edu.au Tue May 31 19:57:10 2016 From: philipp.bayer at uwa.edu.au (Philipp Bayer) Date: Wed, 1 Jun 2016 09:57:10 +0800 Subject: [maker-devel] Question about MAKER chunks and "neighbouring" annotations In-Reply-To: <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> References: <706ecaa0-59c6-9ad7-0af1-4039a1610e73@uwa.edu.au> <9FFBEABC-A24F-4CF6-8A5C-207E03729DEA@gmail.com> Message-ID: Hello, thank you very much for your detailed answer! Looks like I had misinterpreted some details of the program, this is very helpful, thank you! Cheers Philipp On 31.05.2016 23:51, Carson Holt wrote: > Annotations never actually cross a chunk boundaries because the boundaries are not fixed. It?s much more complicated than that, but basically we know from the alignment scoring model the maximum distance an HSP can occur and still be included in the alignment. This means that I know precisely whether there is a chance that an alignment may include another part when it occurs near the edge of a blasted sequence. When there is a chance, the sequence gets extended and everything will be realigned (de novo) using the extended sequence which can include an entire neighboring chunk. This is a very fast operation since it?s just the known hits being aligned rather than the whole database. So think of it more like a dynamic window rather than a fixed boundary. Results are then sorted and serialized to disk. Also the initial BLAST is done with very permissive parameters and overlapping sequence boundaries, so extremely low scoring partial alignments are enough to trigger an extension and realignment (we know before hand the minimum sequence length needed to generate a given alignment score and can extrapolate maximum theoretical score given a yet to be generated extension). > > The serialized alignments then get clustered across the entire length of the contig (not just within a chunk), and clusters are annotated one at a time. Think of it like a linear walk down the contig through the serialized features, clustering as you go. Every time alignments stop being added to a cluster and that cluster ends, it can be annotated as a self contained unit. This is why shared storage is required for MAKER. So MAKER never joins the genes, as they were never called in a way where they could be split in the first place. > > The split_hit parameter affect clustering as well as the alignment model for how far away an HSP can be and still be conceded part of the same alignment (long unpolished alignments with gaps longer than this will be broken into two separate alignments). pred_flank also affects clusteing slightly, but it?s primary effect is the generation of flanking sequence around current cluster boundaries (clusters include all alignments as well as ab initio predictions, so it is added to those existing boundaries). > > The reason you may get models without a start or stop codon, is because HMMs in predictors like snap and augustus pick the highest likelihood path regardless, not because of a chunk split. Also all ab initio calls are part of the cluster, so it is never trimmed in a way that a cluster boundary ever falls part way across one of those models. > > ?Carson > >> On May 31, 2016, at 12:37 AM, Philipp Bayer wrote: >> >> Hello, >> >> I have a minor question about the way MAKER joins annotations from >> different chunks when using MPI. >> >> Let's say I have a longer gene that bridges two chunks, so the jobs >> annotating both chunks separately would return two incomplete genes, one >> without a stop codon, one without a start codon. I assume MAKER would >> then join those two into a single gene, right? Is this behaviour >> influenced by the "split_hit" or "pred_flank" parameters in maker_opts.ctl? >> >> Thank you >> >> Philipp Bayer >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org