From carsonhh at gmail.com Tue Sep 8 11:12:59 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Sep 2015 10:12:59 -0600 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> Message-ID: Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson > On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi wrote: > > Hi Carson, > > Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides > > It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. > > Best, > Chia-Yi > > > From: Carson Holt > > Date: Friday, September 4, 2015 at 2:43 PM > To: Cheng Chia-Yi > > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? > > Hi Chia-Yi, > > I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. > > Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. > > Thanks, > Carson > > > >> On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: >> >> Hi Carson, >> >> Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, >> >> http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 >> http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 >> >> The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. >> >> Please let me know if you need any other information. Thank you so much for your help. >> >> Best, >> Chia-Yi >> >> >> >> From: Carson Holt > >> Date: Thursday, September 3, 2015 at 6:40 PM >> To: Cheng Chia-Yi > >> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >> >> Hi Chia-Yi, >> >> What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. >> >> Thanks, >> Carson >> >> >>> On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: >>> >>> Hi Carson, >>> >>> Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, >>> http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff >>> http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta >>> >>> The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. >>> >>> Thank you. >>> >>> Chia-Yi >>> >>> From: Carson Holt > >>> Date: Monday, August 31, 2015 at 11:08 AM >>> To: Cheng Chia-Yi > >>> Cc: "maker-devel at yandell-lab.org " > >>> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >>> >>> I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. >>> >>> Thanks, >>> Carson >>> >>>> On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: >>>> >>>> Hello MAKER team, >>>> >>>> We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. >>>> >>>> I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: >>>> >>>> Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 >>>> Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 >>>> >>>> The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. >>>> >>>> Please let me know if more info is needed. Any help is appreciated. Thank you. >>>> >>>> Chia-Yi >>>> >>>> >>>> RNA-seq evidence file: >>>> Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + >>>> Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 >>>> Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 >>>> >>>> EST evidence file: >>>> Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 >>>> Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 >>>> >>>> Protein evidence file: >>>> Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 >>>> Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 >>>> Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> <1142_models.diff_AED.gff> >> >> <818.diff_AED.20150831> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue Sep 8 11:13:32 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 8 Sep 2015 16:13:32 +0000 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> , Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E37D97AD@mxb1.hg.genetics.utah.edu> awesome detective work everybody! Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Tuesday, September 08, 2015 10:12 AM To: Cheng, Chia-Yi Cc: maker-devel Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. Best, Chia-Yi From: Carson Holt > Date: Friday, September 4, 2015 at 2:43 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. Thanks, Carson On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. Please let me know if you need any other information. Thank you so much for your help. Best, Chia-Yi From: Carson Holt > Date: Thursday, September 3, 2015 at 6:40 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. Thanks, Carson On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: Hi Carson, Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. Thank you. Chia-Yi From: Carson Holt > Date: Monday, August 31, 2015 at 11:08 AM To: Cheng Chia-Yi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. Thanks, Carson On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: Hello MAKER team, We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. Please let me know if more info is needed. Any help is appreciated. Thank you. Chia-Yi RNA-seq evidence file: Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 EST evidence file: Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 Protein evidence file: Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <1142_models.diff_AED.gff> <818.diff_AED.20150831> From cjfields at illinois.edu Tue Sep 15 11:39:22 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 15 Sep 2015 16:39:22 +0000 Subject: [maker-devel] Profiling MAKER Message-ID: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 16 12:22:05 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Sep 2015 11:22:05 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> Message-ID: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. > > Thanks, > > chris > > Chris Fields > Technical Lead in Genome Informatics > High Performance Computing in Biology > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 21:05:11 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:05:11 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Message-ID: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Sep 17 21:25:49 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 18 Sep 2015 02:25:49 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J wrote: > Carson, > > Thanks! Will pass this on to the folks at NCSA, that should help quite a > bit. > > Yeah, I kinda think it would be nice to come up with an alternative > indexing scheme for fasta indexing, at least add some more flexibility (I?m > guessing this is BioPerl still?). > > chris > > > On Sep 16, 2015, at 12:22 PM, Carson Holt wrote: > > Sorry for the slow reply. I?m out of the lab right now and will be for > the next two weeks. > > MAKER uses MPI for parallelization. So it is optimized for distributed > non-shared memory systems, but should still work fine on a shared memory > system. > > With MPI, you specify the number of processes to start using the -n flag > for mpiexec. Each MAKER process will need about 2Gb. It could be more or > less depending on the amount of evidence it has to hold in RAM (i.e. deep > evidence alignments use more memory). By default each MAKER process will > use a single CPU (even though it will start 3 threads - two of the threads > will use close to 0% CPU). > > MAKER will use a lot of IO. Each process will write/read independently of > the others, so the more processes you start, the more simultaneous IO you > will have. I?ve tried to put most very heavy IO operations in /tmp or > whatever temporary directory you specify. It is important that you never > specify an NFS location for your temporary directory. The rest of the IO > will occur in the working directory. > > Also the Berkley DB implementation that sits behind the fasta indexes for > sequence access don?t always work well with in memory scratch. You should > always try and set /tmp to a physical drive if possible. You will get > several Gb of files in /tmp. > > ?Carson > > > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER > with various performance analysis tools. They would like to know CPU, RAM, > I/O patterns and usage. In particular, we?re seeing some odd performance > problems on a local system which uses a large shared memory cache for > storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers > know of or could point us to that we can start focusing on? Any help would > be greatly appereciated. > > Thanks, > > chris > > *Chris Fields* > *Technical Lead in Genome Informatics* > *High Performance Computing in Biology* > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 21:50:09 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:50:09 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: Possibly. Might be also feasible to use faidx via samtools API (if we?re intent on that path, there is Bio::DB::Sam, where I added a branch with samtools 1.2 support so could possibly tap into faidx at the XS level). chris On Sep 17, 2015, at 9:25 PM, Jason Stajich > wrote: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J > wrote: Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 18 10:12:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Sep 2015 09:12:14 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu>

Message-ID: Yes. Still BioPerl. You?re right, I probably need to switch indexing schemes. I?ve actually made a faidx implementation, but don?t particularly like it. An NCBI index API might be more ideal. ?Carson > On Sep 17, 2015, at 8:50 PM, Fields, Christopher J wrote: > > Possibly. Might be also feasible to use faidx via samtools API (if we?re intent on that path, there is Bio::DB::Sam, where I added a branch with samtools 1.2 support so could possibly tap into faidx at the XS level). > > chris > >> On Sep 17, 2015, at 9:25 PM, Jason Stajich > wrote: >> >> What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. >> On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J > wrote: >> Carson, >> >> Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. >> >> Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). >> >> chris >> >>> On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: >>> >>> Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. >>> >>> MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. >>> >>> With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). >>> >>> MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. >>> >>> Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. >>> >>> ?Carson >>> >>> >>>> On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: >>>> >>>> We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). >>>> >>>> The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. >>>> >>>> Thanks, >>>> >>>> chris >>>> >>>> Chris Fields >>>> Technical Lead in Genome Informatics >>>> High Performance Computing in Biology >>>> University of Illinois at Urbana-Champaign >>>> Roy J. Carver Biotechnology Center / W.M. Keck Center >>>> Carl R. Woese Institute for Genomic Biology >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Mon Sep 21 19:48:49 2015 From: parulk at caltech.edu (Parul Kudtarkar) Date: Mon, 21 Sep 2015 17:48:49 -0700 Subject: [maker-devel] isoforms Message-ID: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> Hi, Is there any parameter to be used while running MAKER2 pipeline to filter out weak isoforms? Thanks, Parul -- Scientific Programmer Center for Computational Regulatory Genomics Beckman Institute, Biology and Biological Engineering California Institute of Technology http://www.echinobase.org/ From mike.thon at gmail.com Wed Sep 23 02:45:26 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 23 Sep 2015 09:45:26 +0200 Subject: [maker-devel] some problem with MPI Message-ID: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> Hi - I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: ./Build install Configuring MAKER with MPI support Installing MAKER... Configuring MAKER with MPI support Subroutine dl_load_flags redefined at (eval 125) line 8. Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. Thanks mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- From anjuli.meiser at gmail.com Wed Sep 23 03:08:58 2015 From: anjuli.meiser at gmail.com (Anjuli Meiser) Date: Wed, 23 Sep 2015 10:08:58 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes Message-ID: <56025E1A.30606@gmail.com> Hello, I am using Maker in two rounds for gene prediction in fungal genomes. In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? Thank you very much in advance for any help in this matter! Best wishes, Anjuli From dence at genetics.utah.edu Thu Sep 24 13:02:04 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 24 Sep 2015 18:02:04 +0000 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <56025E1A.30606@gmail.com> References: <56025E1A.30606@gmail.com> Message-ID: <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> Hi Anjuli, The approach that you outlined sounds pretty reasonable, and I?m not certain I understand the problem with your results. Are the short genes that lie completely in other genes in the introns? Or do you mean that you have overlapping predictions? A common observation in compact fungal genomes is that maker can produce gene models that fuse several adjacent genes together. Could that be what you?re observing? There's actually an option in maker to deal with that issue; it?s the ?correct_est_fusion? setting in the opts control file. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: > > Hello, > > I am using Maker in two rounds for gene prediction in fungal genomes. > > In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. > > I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). > > Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? > > Thank you very much in advance for any help in this matter! > > Best wishes, > Anjuli > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Thu Sep 24 23:23:56 2015 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 25 Sep 2015 06:23:56 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> References: <56025E1A.30606@gmail.com> <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> Message-ID: <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> Hi all - We've been having the same problem. In every case I've examined manually the overlapping gene models have overlapping CDSs and they are on opposite strands. In most cases its easy to see which is the correct model because one has protein or EST/RNA-Seq evidence and the other does not. Most times one model is from Augustus and the other is from genemark, but not always. I found one in which both gene models were from augustus and maker promoted both of them. I count 121 overlaps in our annotation (its a fungal genome). We're about to just go in and remove them manually but I want to see if there is any way to fix my configuration of maker first. Mike > On Sep 24, 2015, at 8:02 PM, Daniel Ence wrote: > > Hi Anjuli, > > The approach that you outlined sounds pretty reasonable, and I?m not certain I understand the problem with your results. Are the short genes that lie completely in other genes in the introns? Or do you mean that you have overlapping predictions? > > A common observation in compact fungal genomes is that maker can produce gene models that fuse several adjacent genes together. Could that be what you?re observing? There's actually an option in maker to deal with that issue; it?s the ?correct_est_fusion? setting in the opts control file. > > Let me know whether that helps, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: >> >> Hello, >> >> I am using Maker in two rounds for gene prediction in fungal genomes. >> >> In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. >> >> I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). >> >> Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? >> >> Thank you very much in advance for any help in this matter! >> >> Best wishes, >> Anjuli >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From janna.lynn.fierst at gmail.com Fri Sep 25 06:20:23 2015 From: janna.lynn.fierst at gmail.com (Janna Fierst) Date: Fri, 25 Sep 2015 06:20:23 -0500 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> References: <56025E1A.30606@gmail.com> <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> Message-ID: We had this problem with a nematode genome, also with very dense genes. We partially addressed it by assembling the RNA-Seq with Trinity and clipping the 5'/3' UTRs, then running with correst_est_fusion. On Thu, Sep 24, 2015 at 11:23 PM, Michael Thon wrote: > Hi all - > > We've been having the same problem. In every case I've examined manually > the overlapping gene models have overlapping CDSs and they are on opposite > strands. In most cases its easy to see which is the correct model because > one has protein or EST/RNA-Seq evidence and the other does not. Most times > one model is from Augustus and the other is from genemark, but not always. > I found one in which both gene models were from augustus and maker promoted > both of them. > > I count 121 overlaps in our annotation (its a fungal genome). We're about > to just go in and remove them manually but I want to see if there is any > way to fix my configuration of maker first. > > Mike > > > > > On Sep 24, 2015, at 8:02 PM, Daniel Ence > wrote: > > > > Hi Anjuli, > > > > The approach that you outlined sounds pretty reasonable, and I?m not > certain I understand the problem with your results. Are the short genes > that lie completely in other genes in the introns? Or do you mean that you > have overlapping predictions? > > > > A common observation in compact fungal genomes is that maker can produce > gene models that fuse several adjacent genes together. Could that be what > you?re observing? There's actually an option in maker to deal with that > issue; it?s the ?correct_est_fusion? setting in the opts control file. > > > > Let me know whether that helps, > > Daniel > > > > Daniel Ence > > Graduate Student > > Eccles Institute of Human Genetics > > University of Utah > > 15 North 2030 East, Room 2100 > > Salt Lake City, UT 84112-5330 > > > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser > wrote: > >> > >> Hello, > >> > >> I am using Maker in two rounds for gene prediction in fungal genomes. > >> > >> In the first round I'm running maker with the HMMs gained from GeneMark > and snap with hints from CEGMA and include RNA evidence through a tophat > gff. Then I convert the maker results to new snap HMMs and augustus HMMs > and run maker in a second round. I also rescue rejected gene models (maker > standard build) by running interproscan. > >> > >> I observed that I get around 10-15% of genes that are overlapping in > some way. That includes short genes predicted to lie completely within the > boundaries of larger genes and also normally overlapping (mostly on > opposite strands). > >> > >> Do you have a suggesting how to deal with this? Did I miss some > settings in maker to reduce these or at least filter out the shorter genes > that are lying within other genes? > >> > >> Thank you very much in advance for any help in this matter! > >> > >> Best wishes, > >> Anjuli > >> > >> _______________________________________________ > >> maker-devel mailing list > >> maker-devel at box290.bluehost.com > >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Janna L. Fierst Assistant Professor Department of Biological Sciences The University of Alabama Tuscaloosa, AL 35847 Office: SEC 1339 Phone: 205-248-1830 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 28 10:42:04 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:42:04 -0600 Subject: [maker-devel] isoforms In-Reply-To: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> References: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> Message-ID: Sorry for the slow reply I?ve been away this last week. Thee is no parameter for isoform strength per se. The ability to call isoforms is strictly determined by the strength of evidence you have. Basically The gene predictors are iteratively ran with a single piece of EST evidence being primary and the remaining evidence being secondary, and then the gene predictor can make any changes it deems appropriate. Most of the time the exact same model comes back, but if a particular piece of evidence suggests a novel splice site then a new model can be produced based of of that hint. However if your EST/mRNA-seq evidence has a lot of noise or contamination, then you may be feeding in a lot of bad hints. These may get ignored since they would generate unworkable ORFs, but not always. There is unfortunately no good way to automatically distinguish a good hint from a bad hint. However if you run MAKER?s results through EVM (Evidence Modeler) you can manually assign weights you deem appropriate to each evidence source. EVM can then modify models based on these weights. ?Carson > On Sep 21, 2015, at 6:48 PM, Parul Kudtarkar wrote: > > Hi, > > Is there any parameter to be used while running MAKER2 pipeline to filter > out weak isoforms? > > Thanks, > Parul > -- > Scientific Programmer > Center for Computational Regulatory Genomics > Beckman Institute, > Biology and Biological Engineering > California Institute of Technology > http://www.echinobase.org/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Sep 28 10:46:15 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:46:15 -0600 Subject: [maker-devel] some problem with MPI In-Reply-To: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> Message-ID: <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> Sorry for the slow reply. I?ve been away for the last week. I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). --Carson > On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: > > Hi - > > I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: > > ./Build install > Configuring MAKER with MPI support > Installing MAKER... > Configuring MAKER with MPI support > Subroutine dl_load_flags redefined at (eval 125) line 8. > Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. > Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm > Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl > Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm > Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) > > > Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. > Thanks > > mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_shmem_base_select failed > --> Returned value -1 instead of OPAL_SUCCESS > -------------------------------------------------------------------------- > [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Error" (-1) instead of "Success" (0) > -------------------------------------------------------------------------- > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_shmem_base_select failed > --> Returned value -1 instead of OPAL_SUCCESS > -------------------------------------------------------------------------- > [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Error" (-1) instead of "Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Sep 28 10:51:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:51:02 -0600 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <56025E1A.30606@gmail.com> References: <56025E1A.30606@gmail.com> Message-ID: <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> Basically you have evidence spuriously aligning to both strands. This means either your repeat masking is insufficient or your EST/mRNA-seq evidence is noisy and generating a lot of false alignments. You may need to turn off single_exon if you rare using it, or reassemble any short read evidence to try and improve the specificity of the alignments. I believe some of the previous responses to your post suggested methods to do this with trinity. ?Carson > On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: > > Hello, > > I am using Maker in two rounds for gene prediction in fungal genomes. > > In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. > > I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). > > Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? > > Thank you very much in advance for any help in this matter! > > Best wishes, > Anjuli > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Tue Sep 29 21:59:08 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 04:59:08 +0200 Subject: [maker-devel] some problem with MPI In-Reply-To: <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> Message-ID: <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> Apparently my system (Ubuntu 14.04) has mipexec and mpiexec.openmpi executables. mpiexec.openmpi works with MAKER. -Mike > On Sep 28, 2015, at 5:46 PM, Carson Holt wrote: > > Sorry for the slow reply. I?ve been away for the last week. > > I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. > > Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). > > --Carson > > >> On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: >> >> Hi - >> >> I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: >> >> ./Build install >> Configuring MAKER with MPI support >> Installing MAKER... >> Configuring MAKER with MPI support >> Subroutine dl_load_flags redefined at (eval 125) line 8. >> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. >> Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm >> Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl >> Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm >> Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) >> >> >> Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. >> Thanks >> >> mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> -------------------------------------------------------------------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -------------------------------------------------------------------------- >> [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> -------------------------------------------------------------------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -------------------------------------------------------------------------- >> [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpiexec noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Sep 29 22:26:13 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Sep 2015 21:26:13 -0600 Subject: [maker-devel] some problem with MPI In-Reply-To: <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> Message-ID: <347A6A6B-20B2-43DC-AB30-CE34698C85D1@gmail.com> Good to know. Thanks, Carson > On Sep 29, 2015, at 8:59 PM, Michael Thon wrote: > > Apparently my system (Ubuntu 14.04) has mipexec and mpiexec.openmpi executables. mpiexec.openmpi works with MAKER. > > -Mike > > >> On Sep 28, 2015, at 5:46 PM, Carson Holt wrote: >> >> Sorry for the slow reply. I?ve been away for the last week. >> >> I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. >> >> Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> --Carson >> >> >>> On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: >>> >>> Hi - >>> >>> I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: >>> >>> ./Build install >>> Configuring MAKER with MPI support >>> Installing MAKER... >>> Configuring MAKER with MPI support >>> Subroutine dl_load_flags redefined at (eval 125) line 8. >>> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. >>> Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm >>> Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl >>> Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm >>> Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) >>> >>> >>> Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. >>> Thanks >>> >>> mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> opal_shmem_base_select failed >>> --> Returned value -1 instead of OPAL_SUCCESS >>> -------------------------------------------------------------------------- >>> [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >>> [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Error" (-1) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> opal_shmem_base_select failed >>> --> Returned value -1 instead of OPAL_SUCCESS >>> -------------------------------------------------------------------------- >>> [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >>> [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Error" (-1) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpiexec noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> -------------------------------------------------------------------------- >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From mike.thon at gmail.com Wed Sep 30 09:51:26 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 16:51:26 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> References: <56025E1A.30606@gmail.com> <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> Message-ID: In my case I did find two overlapping gene preditions on opposite strands from different ab initio gene predictors where neither model has est or protein support. Most of the cases though are where one model has support but not the other so we will probably fix them manually. Thanks for your help > On Sep 28, 2015, at 5:51 PM, Carson Holt wrote: > > Basically you have evidence spuriously aligning to both strands. This means either your repeat masking is insufficient or your EST/mRNA-seq evidence is noisy and generating a lot of false alignments. You may need to turn off single_exon if you rare using it, or reassemble any short read evidence to try and improve the specificity of the alignments. I believe some of the previous responses to your post suggested methods to do this with trinity. > > ?Carson > > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: >> >> Hello, >> >> I am using Maker in two rounds for gene prediction in fungal genomes. >> >> In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. >> >> I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). >> >> Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? >> >> Thank you very much in advance for any help in this matter! >> >> Best wishes, >> Anjuli >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Wed Sep 30 09:54:01 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 16:54:01 +0200 Subject: [maker-devel] repeats Message-ID: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? Thanks Mike From carsonhh at gmail.com Wed Sep 30 10:43:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 09:43:42 -0600 Subject: [maker-devel] repeats In-Reply-To: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> Message-ID: <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. Thanks, Carson > On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: > > Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? > > Thanks > Mike > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Wed Sep 30 11:03:09 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 18:03:09 +0200 Subject: [maker-devel] repeats In-Reply-To: <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> Message-ID: <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> Hi Carson - > On Sep 30, 2015, at 5:43 PM, Carson Holt wrote: > > MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). > > It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. > Yes, I suspect our cufflinks analysis was either run on the unmasked genome or with a different version of the repeats so that probably explains it. > Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. > Yes I had considered that and in fact we already dropped genemark. A lot of the erroneous genes appear to come from snap but if i drop that too I'm only left with augustus which was trained on a different species. We tried training augustus but we never got results that we thought were better than the existing models. Looks like our snap training has issues too. for now I think we'll fix the problems manually and in the future work on our training procedures. thanks for your help. > Thanks, > Carson > > >> On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: >> >> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? >> >> Thanks >> Mike >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Wed Sep 30 11:12:39 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 10:12:39 -0600 Subject: [maker-devel] repeats In-Reply-To: <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> Message-ID: <6B4D8397-43E9-4033-8765-18F6D9A5E12E@gmail.com> You may also want to consider using Trinity to assemble the mRNA-seq evidence rather than using Cufflinks models in GFF3 format. Cufflinks gives better sensitivity, but I find that the specificity of Trinity gives overall better annotations. Also if your organism is a fungus, then Trinity?s jaccard clip option helps to resolve some issues related to transcript merging from overlapping UTR in fungi. It might help with some repeat issues as well. ?Carson > On Sep 30, 2015, at 10:03 AM, Michael Thon wrote: > > Hi Carson - >> On Sep 30, 2015, at 5:43 PM, Carson Holt wrote: >> >> MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). >> >> It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. >> > > Yes, I suspect our cufflinks analysis was either run on the unmasked genome or with a different version of the repeats so that probably explains it. > > >> Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. >> > Yes I had considered that and in fact we already dropped genemark. A lot of the erroneous genes appear to come from snap but if i drop that too I'm only left with augustus which was trained on a different species. We tried training augustus but we never got results that we thought were better than the existing models. Looks like our snap training has issues too. for now I think we'll fix the problems manually and in the future work on our training procedures. > > thanks for your help. > >> Thanks, >> Carson >> >> >>> On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: >>> >>> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? >>> >>> Thanks >>> Mike >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From jom2042 at qatar-med.cornell.edu Wed Sep 30 10:48:04 2015 From: jom2042 at qatar-med.cornell.edu (Joel Malek) Date: Wed, 30 Sep 2015 15:48:04 +0000 Subject: [maker-devel] amazon instance for maker? Message-ID: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> Hello Yandell Lab - I am interested in trying out the Maker annotation pipeline. I was wondering if you had an Amazon image already available with everything installed that I could replicate. Thanks for any information! Joel Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. From carsonhh at gmail.com Wed Sep 30 12:22:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 11:22:09 -0600 Subject: [maker-devel] amazon instance for maker? In-Reply-To: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> References: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> Message-ID: <7FA5792C-8D7B-4DC4-93E1-28E06D65DE83@gmail.com> Here is a blog post for an implementation of MAKER in the cloud that works with multiple instances via MPI ?> http://efish.zoology.msu.edu/running-maker-genome-annotation-on-starcluster/ ?Carson > On Sep 30, 2015, at 9:48 AM, Joel Malek wrote: > > Hello Yandell Lab - I am interested in trying out the Maker annotation pipeline. I was wondering if you had an Amazon image already available with everything installed that I could replicate. Thanks for any information! > Joel > > > > Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Sep 30 14:00:23 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 30 Sep 2015 21:00:23 +0200 Subject: [maker-devel] The origin of te_proteins.fasta Message-ID: Hi, the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 30 14:18:07 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 13:18:07 -0600 Subject: [maker-devel] The origin of te_proteins.fasta In-Reply-To: References: Message-ID: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> It?s from a tool called RepeatRunner. Here is the paper ?> https://publications.mpi-cbg.de/Smith_2007_5404.pdf Post RepeatRunner development, RepeatMasker also started checking against repeats to get better performance. So nowadays it may be somewhat redundant with what RepeatMasker will do, but it does add a little. It?s not updated regularly, but since RepBase started adding proteins that should not be an issue. In addition to a number of protein repeats, te_proteins also contains a number of low complexity entries from NCBI?s NR database that tend to falsely align with great frequency frequently to many genomes. All te_protein matches generate soft masking in the genome whereas RepeatMasker results will be hard masked. ?Carson > On Sep 30, 2015, at 1:00 PM, Ole Kristian T?rresen wrote: > > Hi, > the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). > > Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? > > Thank you. > > Sincerely, > Ole Kristian T?rresen > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 30 14:23:44 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 13:23:44 -0600 Subject: [maker-devel] The origin of te_proteins.fasta In-Reply-To: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> References: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> Message-ID: <5292984A-461E-46CF-8CB8-0D038942AC1D@gmail.com> Sorry. Meant to say ?> "RepeatMasker also started checking against protein repeats to get better performance" ?Carson > On Sep 30, 2015, at 1:18 PM, Carson Holt wrote: > > It?s from a tool called RepeatRunner. Here is the paper ?> https://publications.mpi-cbg.de/Smith_2007_5404.pdf > > Post RepeatRunner development, RepeatMasker also started checking against repeats to get better performance. So nowadays it may be somewhat redundant with what RepeatMasker will do, but it does add a little. It?s not updated regularly, but since RepBase started adding proteins that should not be an issue. > > In addition to a number of protein repeats, te_proteins also contains a number of low complexity entries from NCBI?s NR database that tend to falsely align with great frequency frequently to many genomes. All te_protein matches generate soft masking in the genome whereas RepeatMasker results will be hard masked. > > ?Carson > > >> On Sep 30, 2015, at 1:00 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). >> >> Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? >> >> Thank you. >> >> Sincerely, >> Ole Kristian T?rresen >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 8 10:12:59 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Sep 2015 10:12:59 -0600 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> Message-ID: Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson > On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi wrote: > > Hi Carson, > > Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides > > It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. > > Best, > Chia-Yi > > > From: Carson Holt > > Date: Friday, September 4, 2015 at 2:43 PM > To: Cheng Chia-Yi > > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? > > Hi Chia-Yi, > > I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. > > Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. > > Thanks, > Carson > > > >> On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: >> >> Hi Carson, >> >> Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, >> >> http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 >> http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 >> >> The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. >> >> Please let me know if you need any other information. Thank you so much for your help. >> >> Best, >> Chia-Yi >> >> >> >> From: Carson Holt > >> Date: Thursday, September 3, 2015 at 6:40 PM >> To: Cheng Chia-Yi > >> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >> >> Hi Chia-Yi, >> >> What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. >> >> Thanks, >> Carson >> >> >>> On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: >>> >>> Hi Carson, >>> >>> Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, >>> http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff >>> http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta >>> >>> The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. >>> >>> Thank you. >>> >>> Chia-Yi >>> >>> From: Carson Holt > >>> Date: Monday, August 31, 2015 at 11:08 AM >>> To: Cheng Chia-Yi > >>> Cc: "maker-devel at yandell-lab.org " > >>> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >>> >>> I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. >>> >>> Thanks, >>> Carson >>> >>>> On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: >>>> >>>> Hello MAKER team, >>>> >>>> We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. >>>> >>>> I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: >>>> >>>> Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 >>>> Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 >>>> >>>> The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. >>>> >>>> Please let me know if more info is needed. Any help is appreciated. Thank you. >>>> >>>> Chia-Yi >>>> >>>> >>>> RNA-seq evidence file: >>>> Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + >>>> Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 >>>> Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 >>>> >>>> EST evidence file: >>>> Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 >>>> Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 >>>> >>>> Protein evidence file: >>>> Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 >>>> Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 >>>> Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> <1142_models.diff_AED.gff> >> >> <818.diff_AED.20150831> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue Sep 8 10:13:32 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 8 Sep 2015 16:13:32 +0000 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> , Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E37D97AD@mxb1.hg.genetics.utah.edu> awesome detective work everybody! Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Tuesday, September 08, 2015 10:12 AM To: Cheng, Chia-Yi Cc: maker-devel Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. Best, Chia-Yi From: Carson Holt > Date: Friday, September 4, 2015 at 2:43 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. Thanks, Carson On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. Please let me know if you need any other information. Thank you so much for your help. Best, Chia-Yi From: Carson Holt > Date: Thursday, September 3, 2015 at 6:40 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. Thanks, Carson On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: Hi Carson, Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. Thank you. Chia-Yi From: Carson Holt > Date: Monday, August 31, 2015 at 11:08 AM To: Cheng Chia-Yi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. Thanks, Carson On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: Hello MAKER team, We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. Please let me know if more info is needed. Any help is appreciated. Thank you. Chia-Yi RNA-seq evidence file: Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 EST evidence file: Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 Protein evidence file: Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <1142_models.diff_AED.gff> <818.diff_AED.20150831> From cjfields at illinois.edu Tue Sep 15 10:39:22 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 15 Sep 2015 16:39:22 +0000 Subject: [maker-devel] Profiling MAKER Message-ID: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 16 11:22:05 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Sep 2015 11:22:05 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> Message-ID: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. > > Thanks, > > chris > > Chris Fields > Technical Lead in Genome Informatics > High Performance Computing in Biology > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 20:05:11 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:05:11 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Message-ID: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Sep 17 20:25:49 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 18 Sep 2015 02:25:49 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J wrote: > Carson, > > Thanks! Will pass this on to the folks at NCSA, that should help quite a > bit. > > Yeah, I kinda think it would be nice to come up with an alternative > indexing scheme for fasta indexing, at least add some more flexibility (I?m > guessing this is BioPerl still?). > > chris > > > On Sep 16, 2015, at 12:22 PM, Carson Holt wrote: > > Sorry for the slow reply. I?m out of the lab right now and will be for > the next two weeks. > > MAKER uses MPI for parallelization. So it is optimized for distributed > non-shared memory systems, but should still work fine on a shared memory > system. > > With MPI, you specify the number of processes to start using the -n flag > for mpiexec. Each MAKER process will need about 2Gb. It could be more or > less depending on the amount of evidence it has to hold in RAM (i.e. deep > evidence alignments use more memory). By default each MAKER process will > use a single CPU (even though it will start 3 threads - two of the threads > will use close to 0% CPU). > > MAKER will use a lot of IO. Each process will write/read independently of > the others, so the more processes you start, the more simultaneous IO you > will have. I?ve tried to put most very heavy IO operations in /tmp or > whatever temporary directory you specify. It is important that you never > specify an NFS location for your temporary directory. The rest of the IO > will occur in the working directory. > > Also the Berkley DB implementation that sits behind the fasta indexes for > sequence access don?t always work well with in memory scratch. You should > always try and set /tmp to a physical drive if possible. You will get > several Gb of files in /tmp. > > ?Carson > > > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER > with various performance analysis tools. They would like to know CPU, RAM, > I/O patterns and usage. In particular, we?re seeing some odd performance > problems on a local system which uses a large shared memory cache for > storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers > know of or could point us to that we can start focusing on? Any help would > be greatly appereciated. > > Thanks, > > chris > > *Chris Fields* > *Technical Lead in Genome Informatics* > *High Performance Computing in Biology* > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 20:50:09 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:50:09 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: Possibly. Might be also feasible to use faidx via samtools API (if we?re intent on that path, there is Bio::DB::Sam, where I added a branch with samtools 1.2 support so could possibly tap into faidx at the XS level). chris On Sep 17, 2015, at 9:25 PM, Jason Stajich > wrote: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J > wrote: Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 18 09:12:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Sep 2015 09:12:14 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu>

Message-ID: Yes. Still BioPerl. You?re right, I probably need to switch indexing schemes. I?ve actually made a faidx implementation, but don?t particularly like it. An NCBI index API might be more ideal. ?Carson > On Sep 17, 2015, at 8:50 PM, Fields, Christopher J wrote: > > Possibly. Might be also feasible to use faidx via samtools API (if we?re intent on that path, there is Bio::DB::Sam, where I added a branch with samtools 1.2 support so could possibly tap into faidx at the XS level). > > chris > >> On Sep 17, 2015, at 9:25 PM, Jason Stajich > wrote: >> >> What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. >> On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J > wrote: >> Carson, >> >> Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. >> >> Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). >> >> chris >> >>> On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: >>> >>> Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. >>> >>> MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. >>> >>> With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). >>> >>> MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. >>> >>> Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. >>> >>> ?Carson >>> >>> >>>> On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: >>>> >>>> We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). >>>> >>>> The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. >>>> >>>> Thanks, >>>> >>>> chris >>>> >>>> Chris Fields >>>> Technical Lead in Genome Informatics >>>> High Performance Computing in Biology >>>> University of Illinois at Urbana-Champaign >>>> Roy J. Carver Biotechnology Center / W.M. Keck Center >>>> Carl R. Woese Institute for Genomic Biology >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From parulk at caltech.edu Mon Sep 21 18:48:49 2015 From: parulk at caltech.edu (Parul Kudtarkar) Date: Mon, 21 Sep 2015 17:48:49 -0700 Subject: [maker-devel] isoforms Message-ID: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> Hi, Is there any parameter to be used while running MAKER2 pipeline to filter out weak isoforms? Thanks, Parul -- Scientific Programmer Center for Computational Regulatory Genomics Beckman Institute, Biology and Biological Engineering California Institute of Technology http://www.echinobase.org/ From mike.thon at gmail.com Wed Sep 23 01:45:26 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 23 Sep 2015 09:45:26 +0200 Subject: [maker-devel] some problem with MPI Message-ID: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> Hi - I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: ./Build install Configuring MAKER with MPI support Installing MAKER... Configuring MAKER with MPI support Subroutine dl_load_flags redefined at (eval 125) line 8. Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. Thanks mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_shmem_base_select failed --> Returned value -1 instead of OPAL_SUCCESS -------------------------------------------------------------------------- [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec noticed that the job aborted, but has no info as to the process that caused that situation. -------------------------------------------------------------------------- From anjuli.meiser at gmail.com Wed Sep 23 02:08:58 2015 From: anjuli.meiser at gmail.com (Anjuli Meiser) Date: Wed, 23 Sep 2015 10:08:58 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes Message-ID: <56025E1A.30606@gmail.com> Hello, I am using Maker in two rounds for gene prediction in fungal genomes. In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? Thank you very much in advance for any help in this matter! Best wishes, Anjuli From dence at genetics.utah.edu Thu Sep 24 12:02:04 2015 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 24 Sep 2015 18:02:04 +0000 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <56025E1A.30606@gmail.com> References: <56025E1A.30606@gmail.com> Message-ID: <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> Hi Anjuli, The approach that you outlined sounds pretty reasonable, and I?m not certain I understand the problem with your results. Are the short genes that lie completely in other genes in the introns? Or do you mean that you have overlapping predictions? A common observation in compact fungal genomes is that maker can produce gene models that fuse several adjacent genes together. Could that be what you?re observing? There's actually an option in maker to deal with that issue; it?s the ?correct_est_fusion? setting in the opts control file. Let me know whether that helps, Daniel Daniel Ence Graduate Student Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 > On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: > > Hello, > > I am using Maker in two rounds for gene prediction in fungal genomes. > > In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. > > I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). > > Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? > > Thank you very much in advance for any help in this matter! > > Best wishes, > Anjuli > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Thu Sep 24 22:23:56 2015 From: mike.thon at gmail.com (Michael Thon) Date: Fri, 25 Sep 2015 06:23:56 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> References: <56025E1A.30606@gmail.com> <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> Message-ID: <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> Hi all - We've been having the same problem. In every case I've examined manually the overlapping gene models have overlapping CDSs and they are on opposite strands. In most cases its easy to see which is the correct model because one has protein or EST/RNA-Seq evidence and the other does not. Most times one model is from Augustus and the other is from genemark, but not always. I found one in which both gene models were from augustus and maker promoted both of them. I count 121 overlaps in our annotation (its a fungal genome). We're about to just go in and remove them manually but I want to see if there is any way to fix my configuration of maker first. Mike > On Sep 24, 2015, at 8:02 PM, Daniel Ence wrote: > > Hi Anjuli, > > The approach that you outlined sounds pretty reasonable, and I?m not certain I understand the problem with your results. Are the short genes that lie completely in other genes in the introns? Or do you mean that you have overlapping predictions? > > A common observation in compact fungal genomes is that maker can produce gene models that fuse several adjacent genes together. Could that be what you?re observing? There's actually an option in maker to deal with that issue; it?s the ?correct_est_fusion? setting in the opts control file. > > Let me know whether that helps, > Daniel > > Daniel Ence > Graduate Student > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: >> >> Hello, >> >> I am using Maker in two rounds for gene prediction in fungal genomes. >> >> In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. >> >> I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). >> >> Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? >> >> Thank you very much in advance for any help in this matter! >> >> Best wishes, >> Anjuli >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From janna.lynn.fierst at gmail.com Fri Sep 25 05:20:23 2015 From: janna.lynn.fierst at gmail.com (Janna Fierst) Date: Fri, 25 Sep 2015 06:20:23 -0500 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> References: <56025E1A.30606@gmail.com> <2B2C21C9-2EBE-41C0-A3C7-AC9E5606CFF2@genetics.utah.edu> <6E52F513-5005-46AF-8320-BA84D523A57E@gmail.com> Message-ID: We had this problem with a nematode genome, also with very dense genes. We partially addressed it by assembling the RNA-Seq with Trinity and clipping the 5'/3' UTRs, then running with correst_est_fusion. On Thu, Sep 24, 2015 at 11:23 PM, Michael Thon wrote: > Hi all - > > We've been having the same problem. In every case I've examined manually > the overlapping gene models have overlapping CDSs and they are on opposite > strands. In most cases its easy to see which is the correct model because > one has protein or EST/RNA-Seq evidence and the other does not. Most times > one model is from Augustus and the other is from genemark, but not always. > I found one in which both gene models were from augustus and maker promoted > both of them. > > I count 121 overlaps in our annotation (its a fungal genome). We're about > to just go in and remove them manually but I want to see if there is any > way to fix my configuration of maker first. > > Mike > > > > > On Sep 24, 2015, at 8:02 PM, Daniel Ence > wrote: > > > > Hi Anjuli, > > > > The approach that you outlined sounds pretty reasonable, and I?m not > certain I understand the problem with your results. Are the short genes > that lie completely in other genes in the introns? Or do you mean that you > have overlapping predictions? > > > > A common observation in compact fungal genomes is that maker can produce > gene models that fuse several adjacent genes together. Could that be what > you?re observing? There's actually an option in maker to deal with that > issue; it?s the ?correct_est_fusion? setting in the opts control file. > > > > Let me know whether that helps, > > Daniel > > > > Daniel Ence > > Graduate Student > > Eccles Institute of Human Genetics > > University of Utah > > 15 North 2030 East, Room 2100 > > Salt Lake City, UT 84112-5330 > > > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser > wrote: > >> > >> Hello, > >> > >> I am using Maker in two rounds for gene prediction in fungal genomes. > >> > >> In the first round I'm running maker with the HMMs gained from GeneMark > and snap with hints from CEGMA and include RNA evidence through a tophat > gff. Then I convert the maker results to new snap HMMs and augustus HMMs > and run maker in a second round. I also rescue rejected gene models (maker > standard build) by running interproscan. > >> > >> I observed that I get around 10-15% of genes that are overlapping in > some way. That includes short genes predicted to lie completely within the > boundaries of larger genes and also normally overlapping (mostly on > opposite strands). > >> > >> Do you have a suggesting how to deal with this? Did I miss some > settings in maker to reduce these or at least filter out the shorter genes > that are lying within other genes? > >> > >> Thank you very much in advance for any help in this matter! > >> > >> Best wishes, > >> Anjuli > >> > >> _______________________________________________ > >> maker-devel mailing list > >> maker-devel at box290.bluehost.com > >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Janna L. Fierst Assistant Professor Department of Biological Sciences The University of Alabama Tuscaloosa, AL 35847 Office: SEC 1339 Phone: 205-248-1830 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon Sep 28 09:42:04 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:42:04 -0600 Subject: [maker-devel] isoforms In-Reply-To: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> References: <8ba6705d2b7a117292ecc417796a1192.squirrel@webmail.caltech.edu> Message-ID: Sorry for the slow reply I?ve been away this last week. Thee is no parameter for isoform strength per se. The ability to call isoforms is strictly determined by the strength of evidence you have. Basically The gene predictors are iteratively ran with a single piece of EST evidence being primary and the remaining evidence being secondary, and then the gene predictor can make any changes it deems appropriate. Most of the time the exact same model comes back, but if a particular piece of evidence suggests a novel splice site then a new model can be produced based of of that hint. However if your EST/mRNA-seq evidence has a lot of noise or contamination, then you may be feeding in a lot of bad hints. These may get ignored since they would generate unworkable ORFs, but not always. There is unfortunately no good way to automatically distinguish a good hint from a bad hint. However if you run MAKER?s results through EVM (Evidence Modeler) you can manually assign weights you deem appropriate to each evidence source. EVM can then modify models based on these weights. ?Carson > On Sep 21, 2015, at 6:48 PM, Parul Kudtarkar wrote: > > Hi, > > Is there any parameter to be used while running MAKER2 pipeline to filter > out weak isoforms? > > Thanks, > Parul > -- > Scientific Programmer > Center for Computational Regulatory Genomics > Beckman Institute, > Biology and Biological Engineering > California Institute of Technology > http://www.echinobase.org/ > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Sep 28 09:46:15 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:46:15 -0600 Subject: [maker-devel] some problem with MPI In-Reply-To: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> Message-ID: <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> Sorry for the slow reply. I?ve been away for the last week. I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). --Carson > On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: > > Hi - > > I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: > > ./Build install > Configuring MAKER with MPI support > Installing MAKER... > Configuring MAKER with MPI support > Subroutine dl_load_flags redefined at (eval 125) line 8. > Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. > Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. > Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm > Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl > Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm > Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) > > > Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. > Thanks > > mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_shmem_base_select failed > --> Returned value -1 instead of OPAL_SUCCESS > -------------------------------------------------------------------------- > [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Error" (-1) instead of "Success" (0) > -------------------------------------------------------------------------- > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > opal_shmem_base_select failed > --> Returned value -1 instead of OPAL_SUCCESS > -------------------------------------------------------------------------- > [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort > [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > ompi_mpi_init: orte_init failed > --> Returned "Error" (-1) instead of "Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Mon Sep 28 09:51:02 2015 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 28 Sep 2015 09:51:02 -0600 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <56025E1A.30606@gmail.com> References: <56025E1A.30606@gmail.com> Message-ID: <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> Basically you have evidence spuriously aligning to both strands. This means either your repeat masking is insufficient or your EST/mRNA-seq evidence is noisy and generating a lot of false alignments. You may need to turn off single_exon if you rare using it, or reassemble any short read evidence to try and improve the specificity of the alignments. I believe some of the previous responses to your post suggested methods to do this with trinity. ?Carson > On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: > > Hello, > > I am using Maker in two rounds for gene prediction in fungal genomes. > > In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. > > I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). > > Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? > > Thank you very much in advance for any help in this matter! > > Best wishes, > Anjuli > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Tue Sep 29 20:59:08 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 04:59:08 +0200 Subject: [maker-devel] some problem with MPI In-Reply-To: <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> Message-ID: <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> Apparently my system (Ubuntu 14.04) has mipexec and mpiexec.openmpi executables. mpiexec.openmpi works with MAKER. -Mike > On Sep 28, 2015, at 5:46 PM, Carson Holt wrote: > > Sorry for the slow reply. I?ve been away for the last week. > > I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. > > Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). > > --Carson > > >> On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: >> >> Hi - >> >> I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: >> >> ./Build install >> Configuring MAKER with MPI support >> Installing MAKER... >> Configuring MAKER with MPI support >> Subroutine dl_load_flags redefined at (eval 125) line 8. >> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. >> Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. >> Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm >> Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl >> Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm >> Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) >> >> >> Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. >> Thanks >> >> mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> -------------------------------------------------------------------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -------------------------------------------------------------------------- >> [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >> -------------------------------------------------------------------------- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_shmem_base_select failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -------------------------------------------------------------------------- >> [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >> [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpiexec noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Tue Sep 29 21:26:13 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 29 Sep 2015 21:26:13 -0600 Subject: [maker-devel] some problem with MPI In-Reply-To: <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> References: <3CE18A9F-00C2-4A29-B32C-F6A9222AC8EB@gmail.com> <1F3AE9C8-9578-48B1-86C7-91A824B9F5CA@gmail.com> <64B18ED3-1603-47C7-B3CB-72124E87CE84@gmail.com> Message-ID: <347A6A6B-20B2-43DC-AB30-CE34698C85D1@gmail.com> Good to know. Thanks, Carson > On Sep 29, 2015, at 8:59 PM, Michael Thon wrote: > > Apparently my system (Ubuntu 14.04) has mipexec and mpiexec.openmpi executables. mpiexec.openmpi works with MAKER. > > -Mike > > >> On Sep 28, 2015, at 5:46 PM, Carson Holt wrote: >> >> Sorry for the slow reply. I?ve been away for the last week. >> >> I?ve found that using Ubuntu?s apt-get doesn?t always set up OpenMPI and MPICH2 correctly for shared libraries. You may have to do a manual install. >> >> Also if using OpenMPI, make sure to set LD_PRELOAD environmental variable to the location of libmpi.so before even trying to install MAKER. It must also be set before running MAKER (or any program that uses OpenMPI's shared libraries), so it's best just to add it to your ~/.bash_profile. (i.e. export LD_PRELOAD=/usr/local/openmpi/lib/libmpi.so). >> >> --Carson >> >> >>> On Sep 23, 2015, at 1:45 AM, Michael Thon wrote: >>> >>> Hi - >>> >>> I'm installing MAKER and I can't get it to run with MPI. I'm using Ubuntu linux and the openmpi packages from the linux package manager. when I ran perl Build.pl I made sure that the paths were correct. Running Build install gave me these errors: >>> >>> ./Build install >>> Configuring MAKER with MPI support >>> Installing MAKER... >>> Configuring MAKER with MPI support >>> Subroutine dl_load_flags redefined at (eval 125) line 8. >>> Subroutine Parallel::Application::MPI::C_MPI_ANY_SOURCE redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_ANY_TAG redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_SUCCESS redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Init redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Finalize redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Comm_rank redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Comm_size redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Send redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::C_MPI_Recv redefined at (eval 125) line 9. >>> Subroutine Parallel::Application::MPI::_comment redefined at (eval 125) line 9. >>> Installing /home/mike/maker/maker/src/../perl/lib/MAKER/ConfigData.pm >>> Installing /home/mike/maker/maker/src/../perl/lib/auto/Parallel/Application/MPI/MPI.inl >>> Installing /home/mike/maker/maker/src/../perl/man/MAKER::ConfigData.3pm >>> Skip /home/mike/maker/maker/src/../perl/config-x86_64-linux-gnu-thread-multi-5.018002 (unchanged) >>> >>> >>> Here are the errors I get when trying to run maker. Maker seems to work fine if I run it without mpi. Any suggestions are welcome. >>> Thanks >>> >>> mpiexec -n 2 /home/mike/maker/maker/bin/maker -nodatastore >out >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28576] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> opal_shmem_base_select failed >>> --> Returned value -1 instead of OPAL_SUCCESS >>> -------------------------------------------------------------------------- >>> [odie:28576] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >>> [odie:28576] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Error" (-1) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_paffinity_hwloc: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_carto_file: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_mmap: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_posix: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> [odie:28575] mca: base: component_find: unable to open /usr/lib/openmpi/lib/openmpi/mca_shmem_sysv: perhaps a missing symbol, or compiled for a different version of Open MPI? (ignored) >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> opal_shmem_base_select failed >>> --> Returned value -1 instead of OPAL_SUCCESS >>> -------------------------------------------------------------------------- >>> [odie:28575] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >>> [odie:28575] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Error" (-1) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpiexec noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> -------------------------------------------------------------------------- >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From mike.thon at gmail.com Wed Sep 30 08:51:26 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 16:51:26 +0200 Subject: [maker-devel] maker gene prediction and overlapping genes In-Reply-To: <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> References: <56025E1A.30606@gmail.com> <21E487A6-AED6-4364-9AA1-412AF4177C10@gmail.com> Message-ID: In my case I did find two overlapping gene preditions on opposite strands from different ab initio gene predictors where neither model has est or protein support. Most of the cases though are where one model has support but not the other so we will probably fix them manually. Thanks for your help > On Sep 28, 2015, at 5:51 PM, Carson Holt wrote: > > Basically you have evidence spuriously aligning to both strands. This means either your repeat masking is insufficient or your EST/mRNA-seq evidence is noisy and generating a lot of false alignments. You may need to turn off single_exon if you rare using it, or reassemble any short read evidence to try and improve the specificity of the alignments. I believe some of the previous responses to your post suggested methods to do this with trinity. > > ?Carson > > >> On Sep 23, 2015, at 2:08 AM, Anjuli Meiser wrote: >> >> Hello, >> >> I am using Maker in two rounds for gene prediction in fungal genomes. >> >> In the first round I'm running maker with the HMMs gained from GeneMark and snap with hints from CEGMA and include RNA evidence through a tophat gff. Then I convert the maker results to new snap HMMs and augustus HMMs and run maker in a second round. I also rescue rejected gene models (maker standard build) by running interproscan. >> >> I observed that I get around 10-15% of genes that are overlapping in some way. That includes short genes predicted to lie completely within the boundaries of larger genes and also normally overlapping (mostly on opposite strands). >> >> Do you have a suggesting how to deal with this? Did I miss some settings in maker to reduce these or at least filter out the shorter genes that are lying within other genes? >> >> Thank you very much in advance for any help in this matter! >> >> Best wishes, >> Anjuli >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Wed Sep 30 08:54:01 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 16:54:01 +0200 Subject: [maker-devel] repeats Message-ID: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? Thanks Mike From carsonhh at gmail.com Wed Sep 30 09:43:42 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 09:43:42 -0600 Subject: [maker-devel] repeats In-Reply-To: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> Message-ID: <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. Thanks, Carson > On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: > > Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? > > Thanks > Mike > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From mike.thon at gmail.com Wed Sep 30 10:03:09 2015 From: mike.thon at gmail.com (Michael Thon) Date: Wed, 30 Sep 2015 18:03:09 +0200 Subject: [maker-devel] repeats In-Reply-To: <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> Message-ID: <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> Hi Carson - > On Sep 30, 2015, at 5:43 PM, Carson Holt wrote: > > MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). > > It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. > Yes, I suspect our cufflinks analysis was either run on the unmasked genome or with a different version of the repeats so that probably explains it. > Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. > Yes I had considered that and in fact we already dropped genemark. A lot of the erroneous genes appear to come from snap but if i drop that too I'm only left with augustus which was trained on a different species. We tried training augustus but we never got results that we thought were better than the existing models. Looks like our snap training has issues too. for now I think we'll fix the problems manually and in the future work on our training procedures. thanks for your help. > Thanks, > Carson > > >> On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: >> >> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? >> >> Thanks >> Mike >> >> >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > From carsonhh at gmail.com Wed Sep 30 10:12:39 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 10:12:39 -0600 Subject: [maker-devel] repeats In-Reply-To: <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> References: <9E56F15B-2FF2-40A6-98DB-D3D35E258BC5@gmail.com> <25FC758C-0197-4641-A025-6AD17F8AD31C@gmail.com> <0CE38FFE-A2B9-4BC0-83E1-A0E9F5ECFD54@gmail.com> Message-ID: <6B4D8397-43E9-4033-8765-18F6D9A5E12E@gmail.com> You may also want to consider using Trinity to assemble the mRNA-seq evidence rather than using Cufflinks models in GFF3 format. Cufflinks gives better sensitivity, but I find that the specificity of Trinity gives overall better annotations. Also if your organism is a fungus, then Trinity?s jaccard clip option helps to resolve some issues related to transcript merging from overlapping UTR in fungi. It might help with some repeat issues as well. ?Carson > On Sep 30, 2015, at 10:03 AM, Michael Thon wrote: > > Hi Carson - >> On Sep 30, 2015, at 5:43 PM, Carson Holt wrote: >> >> MAKER?s standard repeat masking protocol is to use RepeatMasker to identify repeat, then repeatrunner to extend masking for diverged repeats. Complex repeats will be hard masked and simple repeats will be soft masked (anything coming from GFF3 will be hard masked). BLAST then runs to identify evidence alignments against the masked genome assembly. Exonerate is then allowed to polish the BLAST alignments with any applied masking removed (this is because we already have an alignment outside of the masked region so removing masking keeps it from interfering with the polishing). >> >> It is possible that REPET is not capturing the full repeat which would allow partial alignment outside of masked regions that can then be polished back into masked regions, or you have mRNA-seq evidence where the repeat has been assembled into the transcript sequence (so the repeat gets polished back in). If that is the case you may want to consider letting RepeatMasker and RepeatRunner run along side with the supplied repeat GFF3. Alternatively you could try hard masking the genome assembly before ever giving it to MAKER (so REPET masked regions can never be unmasked), but that might cause some issue with some polishing steps. >> > > Yes, I suspect our cufflinks analysis was either run on the unmasked genome or with a different version of the repeats so that probably explains it. > > >> Also if your ab initio predictors are calling genes on opposite strands, and one predictor seems to perform particularly poorly, you may want to drop it from your analysis. I find that I have to do this with GeneMark sometimes. >> > Yes I had considered that and in fact we already dropped genemark. A lot of the erroneous genes appear to come from snap but if i drop that too I'm only left with augustus which was trained on a different species. We tried training augustus but we never got results that we thought were better than the existing models. Looks like our snap training has issues too. for now I think we'll fix the problems manually and in the future work on our training procedures. > > thanks for your help. > >> Thanks, >> Carson >> >> >>> On Sep 30, 2015, at 8:54 AM, Michael Thon wrote: >>> >>> Hi all - the other issue that we're having with maker is with repeats. We have an annotation of repeats done by a colleague using REPET. I'm passing the annotation in using the rm_gff option and leaving all the other repeat masking options turned off. I found at least one case where a CDS of a final gene model overlaps with a repeat annotation. Does this indicate some problem with my input file or with MAKER? >>> >>> Thanks >>> Mike >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > From jom2042 at qatar-med.cornell.edu Wed Sep 30 09:48:04 2015 From: jom2042 at qatar-med.cornell.edu (Joel Malek) Date: Wed, 30 Sep 2015 15:48:04 +0000 Subject: [maker-devel] amazon instance for maker? Message-ID: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> Hello Yandell Lab - I am interested in trying out the Maker annotation pipeline. I was wondering if you had an Amazon image already available with everything installed that I could replicate. Thanks for any information! Joel Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. From carsonhh at gmail.com Wed Sep 30 11:22:09 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 11:22:09 -0600 Subject: [maker-devel] amazon instance for maker? In-Reply-To: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> References: <7B53984A-3856-4EA7-B688-A2B57993BA82@qatar-med.cornell.edu> Message-ID: <7FA5792C-8D7B-4DC4-93E1-28E06D65DE83@gmail.com> Here is a blog post for an implementation of MAKER in the cloud that works with multiple instances via MPI ?> http://efish.zoology.msu.edu/running-maker-genome-annotation-on-starcluster/ ?Carson > On Sep 30, 2015, at 9:48 AM, Joel Malek wrote: > > Hello Yandell Lab - I am interested in trying out the Maker annotation pipeline. I was wondering if you had an Amazon image already available with everything installed that I could replicate. Thanks for any information! > Joel > > > > Disclaimer: This email and its attachments may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received. > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ole.toerresen at gmail.com Wed Sep 30 13:00:23 2015 From: ole.toerresen at gmail.com (=?UTF-8?Q?Ole_Kristian_T=C3=B8rresen?=) Date: Wed, 30 Sep 2015 21:00:23 +0200 Subject: [maker-devel] The origin of te_proteins.fasta Message-ID: Hi, the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? Thank you. Sincerely, Ole Kristian T?rresen -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 30 13:18:07 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 13:18:07 -0600 Subject: [maker-devel] The origin of te_proteins.fasta In-Reply-To: References: Message-ID: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> It?s from a tool called RepeatRunner. Here is the paper ?> https://publications.mpi-cbg.de/Smith_2007_5404.pdf Post RepeatRunner development, RepeatMasker also started checking against repeats to get better performance. So nowadays it may be somewhat redundant with what RepeatMasker will do, but it does add a little. It?s not updated regularly, but since RepBase started adding proteins that should not be an issue. In addition to a number of protein repeats, te_proteins also contains a number of low complexity entries from NCBI?s NR database that tend to falsely align with great frequency frequently to many genomes. All te_protein matches generate soft masking in the genome whereas RepeatMasker results will be hard masked. ?Carson > On Sep 30, 2015, at 1:00 PM, Ole Kristian T?rresen wrote: > > Hi, > the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). > > Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? > > Thank you. > > Sincerely, > Ole Kristian T?rresen > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 30 13:23:44 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 30 Sep 2015 13:23:44 -0600 Subject: [maker-devel] The origin of te_proteins.fasta In-Reply-To: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> References: <01D987FB-3709-4DD3-B55D-0A00F9EFC2FB@gmail.com> Message-ID: <5292984A-461E-46CF-8CB8-0D038942AC1D@gmail.com> Sorry. Meant to say ?> "RepeatMasker also started checking against protein repeats to get better performance" ?Carson > On Sep 30, 2015, at 1:18 PM, Carson Holt wrote: > > It?s from a tool called RepeatRunner. Here is the paper ?> https://publications.mpi-cbg.de/Smith_2007_5404.pdf > > Post RepeatRunner development, RepeatMasker also started checking against repeats to get better performance. So nowadays it may be somewhat redundant with what RepeatMasker will do, but it does add a little. It?s not updated regularly, but since RepBase started adding proteins that should not be an issue. > > In addition to a number of protein repeats, te_proteins also contains a number of low complexity entries from NCBI?s NR database that tend to falsely align with great frequency frequently to many genomes. All te_protein matches generate soft masking in the genome whereas RepeatMasker results will be hard masked. > > ?Carson > > >> On Sep 30, 2015, at 1:00 PM, Ole Kristian T?rresen > wrote: >> >> Hi, >> the file te_proteins.fasta is distributed with MAKER and is suggested as a way to find more divergent transposable elements by searching in protein level instead of at nucleotide level. I've been unable to find any information about it's creation, and whether or not it has been kept current. There is a file with mobile elements derived proteins distributed with RepBase, called RepeatPeps.lib, which seem to contain the same amount of sequences (about 9.4 Mbp in both), but half the number (10500 vs 25000). >> >> Does anyone know how these two files compare? Could I use RepeatPeps.lib instead, or combine them (with some clustering maybe?)? >> >> Thank you. >> >> Sincerely, >> Ole Kristian T?rresen >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue Sep 8 10:12:59 2015 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 8 Sep 2015 10:12:59 -0600 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> Message-ID: Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson > On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi wrote: > > Hi Carson, > > Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides > > It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. > > Best, > Chia-Yi > > > From: Carson Holt > > Date: Friday, September 4, 2015 at 2:43 PM > To: Cheng Chia-Yi > > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? > > Hi Chia-Yi, > > I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. > > Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. > > Thanks, > Carson > > > >> On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: >> >> Hi Carson, >> >> Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, >> >> http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 >> http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 >> >> The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. >> >> Please let me know if you need any other information. Thank you so much for your help. >> >> Best, >> Chia-Yi >> >> >> >> From: Carson Holt > >> Date: Thursday, September 3, 2015 at 6:40 PM >> To: Cheng Chia-Yi > >> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >> >> Hi Chia-Yi, >> >> What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. >> >> Thanks, >> Carson >> >> >>> On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: >>> >>> Hi Carson, >>> >>> Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, >>> http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff >>> http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta >>> >>> The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. >>> >>> Thank you. >>> >>> Chia-Yi >>> >>> From: Carson Holt > >>> Date: Monday, August 31, 2015 at 11:08 AM >>> To: Cheng Chia-Yi > >>> Cc: "maker-devel at yandell-lab.org " > >>> Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? >>> >>> I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. >>> >>> Thanks, >>> Carson >>> >>>> On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: >>>> >>>> Hello MAKER team, >>>> >>>> We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. >>>> >>>> I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: >>>> >>>> Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 >>>> Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 >>>> >>>> The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. >>>> >>>> Please let me know if more info is needed. Any help is appreciated. Thank you. >>>> >>>> Chia-Yi >>>> >>>> >>>> RNA-seq evidence file: >>>> Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + >>>> Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 >>>> Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 >>>> >>>> EST evidence file: >>>> Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 >>>> Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 >>>> >>>> Protein evidence file: >>>> Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 >>>> Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 >>>> Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> <1142_models.diff_AED.gff> >> >> <818.diff_AED.20150831> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue Sep 8 10:13:32 2015 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 8 Sep 2015 16:13:32 +0000 Subject: [maker-devel] AED scores from MAKER pipeline - deterministic or not? In-Reply-To: References: <81B272E9-3439-49C6-96E0-674B03F87569@gmail.com>

<43C687F7-4B40-4E4F-B255-E1D2B9D6D4DC@gmail.com> , Message-ID: <7A60AB257EFF2B48B1F4C814817EA053E37D97AD@mxb1.hg.genetics.utah.edu> awesome detective work everybody! Mark Yandell Professor of Human Genetics H.A. & Edna Benning Presidential Endowed Chair Co-director USTAR Center for Genetic Discovery Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 ph:801-587-7707 ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Tuesday, September 08, 2015 10:12 AM To: Cheng, Chia-Yi Cc: maker-devel Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I?m glad to see you found a way around the issue you were seeing. Another solution may be to split up your input genome into several separate jobs, and run each one separately. Just out of curiosity could you send me the results of these two commands? df -h /tmp df -h A GFFDB.pm lock failure generally means either your working directory is network mounted and MAKER can?t detect it or that /tmp is tmpfs both of which can cause SQLite failures. Thanks, Carson On Sep 8, 2015, at 9:46 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for the suggestions. For my previous runs, I?ve been setting the TMP to a non-NFS position and used 4 or 8 CPUs for MPI. In the MPI log file there is a consistent error, DBD::SQLite::db selectcol_arrayref failed: database is locked at maker-2.31.8/bin/../lib/GFFDB.pm line 525./, which may associate with the IO error you pointed out. This is likely caused by the MPI setting in our institute. Therefore, my team mate Vivek suggested to run on non-MPI. It took about a day to run, compared to ~6 hours when using MPI. Yet it did not create any error and the AED from two runs were identical. The command for the successful runs was, maker -R -quiet -TMP /tmp -fix_nucleotides It looks like this approach has resolved the issue. Please feel free to post this update to the Google group. Again, thank you for your help. Best, Chia-Yi From: Carson Holt > Date: Friday, September 4, 2015 at 2:43 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, I think I found the issue based off the data difference between the GFF3 files. MAKER uses a number of intermediate files to store data as it progresses (will be in regional chunks). It looks like you had an IO error in one of the runs and one of these files was likely empty (note attached image with circled region where all EST/mRNA data just drops out - only happens in one of the files). It didn?t kill the job (NFS errors rarely do - it?s one of their optimizations, they always return success and assume it will complete eventually). You can run again with MAKER -a options to rebuild the data output. Make sure your TMP= environment variable is not pointing to an NFS mounted location (that would exacerbate issues). You also may need to scale back the number of CPUs you are running using MPI in order to reduce the IO burden. Thanks, Carson On Sep 4, 2015, at 9:06 AM, Cheng, Chia-Yi > wrote: Hi Carson, Thank you for clarifying it up. The two MAKER generated GFF files could be downloaded from iPlant now, http://de.iplantcollaborative.org/dl/d/0C9CBD8F-9B6E-40F1-A2FA-4F7AC7AAE4B5/Chr1.gff.20150831 http://de.iplantcollaborative.org/dl/d/4C73FD9D-BE7E-4937-84D5-1D7F32196B67/Chr1.gff.repeat_20150831 The control files for these two runs and the a list of 818 models with different AED scores are attached to this email. Please let me know if you need any other information. Thank you so much for your help. Best, Chia-Yi From: Carson Holt > Date: Thursday, September 3, 2015 at 6:40 PM To: Cheng Chia-Yi > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? Hi Chia-Yi, What I really need are the MAKER produced GFF3 outputs from both runs (the individual contig files with the fasta at the end). Just Chr1 is sufficient. Thanks, Carson On Aug 31, 2015, at 10:20 AM, Cheng, Chia-Yi > wrote: Hi Carson, Please find the 1142 gene models with different AED from both runs. Due to the size, please download the annotated GFF3 and fasta files from iPlant, http://de.iplantcollaborative.org/dl/d/2C1901E6-7F52-4264-9CB7-AB72CEF6BD67/TAIR10.protein_coding_loci_27415.gff http://de.iplantcollaborative.org/dl/d/44A6AD38-E408-4DB7-AC32-6689D3D1AC7A/TAIR10.protein_coding_loci_27415.fasta The single_exon= was set to zero in both sets. The two runs have used identical control files which were also attached. I thought single_exon= only mattered for generating annotation and didn?t realize it would also affect AED calculation. Thank you. Chia-Yi From: Carson Holt > Date: Monday, August 31, 2015 at 11:08 AM To: Cheng Chia-Yi > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] AED scores from MAKER pipeline - deterministic or not? I would have to see the actual GFF3 files (full file including fast at the end). Give me both GFF3 files and the coordinates of the gene in question. My first guess is that you had the single_exon= filter set to different values on each run. The gene in question is an unspliced single exon gene (based on the QI), your primary piece of evidence appears to be a single exon EST, and the only value that changes in the QI is the exon overlap. Single exon evidence will be ignored by default for the AED calculation unless you have single_exon set to 1. Thanks, Carson On Aug 31, 2015, at 8:47 AM, Cheng, Chia-Yi > wrote: Hello MAKER team, We at JCVI have been using MAKER (2.31.8) to calculate the AED of Arabidopsis gene models. We provided the annotation set as ?model_gff? with evidence file in ?protein_gff? and ?est_gff?. All the other settings were default. One issue I?ve noticed was that the AED scores did not seem to be deterministic. When I compare the AED scores from two runs using identical control files, ~1,000 (out of 35,385) gene models had different AED scores. The difference between two sets of AED scores could range from 0.01 to 1.00. I looked into several gene models with lager difference, i.e. AED = 0.00 in run 1 and AED = 1.00 in run 2, and noticed a disagreement in the QI: Run 1: _AED=0.00;_eAED=-0.00;_QI=0|-1|0|1|-1|0|1|0|344 Run 2: _AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|0|344 The discrepancy in the 4th column seemed to suggest the evidence file was not used properly in run 2. I?m not sure what may have caused as both runs have used the same input. A snapshot of the evidence files are pasted in the end of the email in case needed. Please let me know if more info is needed. Any help is appreciated. Thank you. Chia-Yi RNA-seq evidence file: Chr1 assembler-aerial2_pasacDNA_match36245927.+.ID=aerial2_align_161343;Target=asmbl_1 1082 1234 +%2Casmbl_1 692 1081 +%2Casmbl_1 572 691 +%2Casmbl_1 1 290 +%2Casmbl_1 291 571 +%2Casmbl_1 1235 1723 + Chr1 assembler-aerial2_pasamatch_part36243913.+.ID=aerial2_align_161343-1;Parent=aerial2_align_161343 Chr1 assembler-aerial2_pasamatch_part39964276.+.ID=aerial2_align_161343-2;Parent=aerial2_align_161343 EST evidence file: Chr1 est2genomeexpressed_sequence_match547058992150-.ID=Chr1:hit:213:3.2.0.0;Name=gi|19829901|gb|AV795918|RAFL08-19-M04 Chr1 est2genomematch_part547058992150-.ID=Chr1:hsp:500:3.2.0.0;Parent=Chr1:hit:213:3.2.0.0;Target=gi|19829901|gb|AV795918|RAFL08-19-M04 2 431 +;Gap=M430 Protein evidence file: Chr1 protein2genomeprotein_match37605284727+.ID=Chr1:hit:202:3.10.0.0;Name=UniRef90_M4EWW1 Chr1 protein2genomematch_part37603913727+.ID=Chr1:hsp:488:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 1 50;Gap=M31 D1 M19 F1 Chr1 protein2genomematch_part39964276727+.ID=Chr1:hsp:489:3.10.0.0;Parent=Chr1:hit:202:3.10.0.0;Target=UniRef90_M4EWW1 51 144;Gap=R1 M23 D1 M28 D1 M36 I2 M5 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org <1142_models.diff_AED.gff> <818.diff_AED.20150831> From cjfields at illinois.edu Tue Sep 15 10:39:22 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 15 Sep 2015 16:39:22 +0000 Subject: [maker-devel] Profiling MAKER Message-ID: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed Sep 16 11:22:05 2015 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 16 Sep 2015 11:22:05 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> Message-ID: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. > > Thanks, > > chris > > Chris Fields > Technical Lead in Genome Informatics > High Performance Computing in Biology > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 20:05:11 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:05:11 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> Message-ID: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.stajich at gmail.com Thu Sep 17 20:25:49 2015 From: jason.stajich at gmail.com (Jason Stajich) Date: Fri, 18 Sep 2015 02:25:49 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J wrote: > Carson, > > Thanks! Will pass this on to the folks at NCSA, that should help quite a > bit. > > Yeah, I kinda think it would be nice to come up with an alternative > indexing scheme for fasta indexing, at least add some more flexibility (I?m > guessing this is BioPerl still?). > > chris > > > On Sep 16, 2015, at 12:22 PM, Carson Holt wrote: > > Sorry for the slow reply. I?m out of the lab right now and will be for > the next two weeks. > > MAKER uses MPI for parallelization. So it is optimized for distributed > non-shared memory systems, but should still work fine on a shared memory > system. > > With MPI, you specify the number of processes to start using the -n flag > for mpiexec. Each MAKER process will need about 2Gb. It could be more or > less depending on the amount of evidence it has to hold in RAM (i.e. deep > evidence alignments use more memory). By default each MAKER process will > use a single CPU (even though it will start 3 threads - two of the threads > will use close to 0% CPU). > > MAKER will use a lot of IO. Each process will write/read independently of > the others, so the more processes you start, the more simultaneous IO you > will have. I?ve tried to put most very heavy IO operations in /tmp or > whatever temporary directory you specify. It is important that you never > specify an NFS location for your temporary directory. The rest of the IO > will occur in the working directory. > > Also the Berkley DB implementation that sits behind the fasta indexes for > sequence access don?t always work well with in memory scratch. You should > always try and set /tmp to a physical drive if possible. You will get > several Gb of files in /tmp. > > ?Carson > > > On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: > > We have a group locally (at NCSA) who is interested in profiling MAKER > with various performance analysis tools. They would like to know CPU, RAM, > I/O patterns and usage. In particular, we?re seeing some odd performance > problems on a local system which uses a large shared memory cache for > storing temp/scratch data (/dev/shm). > > The question is: are there any particular pain points users and developers > know of or could point us to that we can start focusing on? Any help would > be greatly appereciated. > > Thanks, > > chris > > *Chris Fields* > *Technical Lead in Genome Informatics* > *High Performance Computing in Biology* > University of Illinois at Urbana-Champaign > Roy J. Carver Biotechnology Center / W.M. Keck Center > Carl R. Woese Institute for Genomic Biology > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Thu Sep 17 20:50:09 2015 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 18 Sep 2015 02:50:09 +0000 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu> Message-ID: Possibly. Might be also feasible to use faidx via samtools API (if we?re intent on that path, there is Bio::DB::Sam, where I added a branch with samtools 1.2 support so could possibly tap into faidx at the XS level). chris On Sep 17, 2015, at 9:25 PM, Jason Stajich > wrote: What about cdbfasta -- wonder if perl Api to this indexing is possible -- or could be NCBI blast index since that also is a dependency in maker. On Thu, Sep 17, 2015 at 7:05 PM Fields, Christopher J > wrote: Carson, Thanks! Will pass this on to the folks at NCSA, that should help quite a bit. Yeah, I kinda think it would be nice to come up with an alternative indexing scheme for fasta indexing, at least add some more flexibility (I?m guessing this is BioPerl still?). chris On Sep 16, 2015, at 12:22 PM, Carson Holt > wrote: Sorry for the slow reply. I?m out of the lab right now and will be for the next two weeks. MAKER uses MPI for parallelization. So it is optimized for distributed non-shared memory systems, but should still work fine on a shared memory system. With MPI, you specify the number of processes to start using the -n flag for mpiexec. Each MAKER process will need about 2Gb. It could be more or less depending on the amount of evidence it has to hold in RAM (i.e. deep evidence alignments use more memory). By default each MAKER process will use a single CPU (even though it will start 3 threads - two of the threads will use close to 0% CPU). MAKER will use a lot of IO. Each process will write/read independently of the others, so the more processes you start, the more simultaneous IO you will have. I?ve tried to put most very heavy IO operations in /tmp or whatever temporary directory you specify. It is important that you never specify an NFS location for your temporary directory. The rest of the IO will occur in the working directory. Also the Berkley DB implementation that sits behind the fasta indexes for sequence access don?t always work well with in memory scratch. You should always try and set /tmp to a physical drive if possible. You will get several Gb of files in /tmp. ?Carson On Sep 15, 2015, at 10:39 AM, Fields, Christopher J > wrote: We have a group locally (at NCSA) who is interested in profiling MAKER with various performance analysis tools. They would like to know CPU, RAM, I/O patterns and usage. In particular, we?re seeing some odd performance problems on a local system which uses a large shared memory cache for storing temp/scratch data (/dev/shm). The question is: are there any particular pain points users and developers know of or could point us to that we can start focusing on? Any help would be greatly appereciated. Thanks, chris Chris Fields Technical Lead in Genome Informatics High Performance Computing in Biology University of Illinois at Urbana-Champaign Roy J. Carver Biotechnology Center / W.M. Keck Center Carl R. Woese Institute for Genomic Biology _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri Sep 18 09:12:14 2015 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 18 Sep 2015 09:12:14 -0600 Subject: [maker-devel] Profiling MAKER In-Reply-To: References: <385D8628-AE97-4A35-93F9-41CC6C7136A0@illinois.edu> <47E9C2DD-4557-4C22-BB09-29ED54104C36@gmail.com> <1408A7FC-DD56-47F8-95B1-878B21D976EB@illinois.edu>