From kdelmore at zoology.ubc.ca Thu May 1 10:06:27 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Thu, 1 May 2014 08:06:27 -0700 Subject: [maker-devel] problem with dsindex Message-ID: Hi Carson, I wanted to confirm that the interproscan scripts provided in maker are now compatible with version 5 of the program and ask if there was any additional documentation for the use of iprscan_wrap. It looks like that script will run interproscan for us but I'm not sure what to supply on the command line. I could also run interproscan directory but am wondering if you have any suggestions for what to include on the command line, as this has changed in the new version. This is what I would propose: ./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup Thanks, Kira From carsonhh at gmail.com Fri May 2 13:18:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:18:04 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: References: Message-ID: The scripts that use interproscan output should work with version 5 (iprscan2gff3, ipr_update_gff, etc.). But scripts that wrap interproscan and run it for you like iprscan_wrap only work with version 4. Thanks, Carson On 5/1/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Hi Carson, > >I wanted to confirm that the interproscan scripts provided in maker are >now compatible with version 5 of the program and ask if there was any >additional documentation for the use of iprscan_wrap. It looks like that >script will run interproscan for us but I'm not sure what to supply on the >command line. > >I could also run interproscan directory but am wondering if you have any >suggestions for what to include on the command line, as this has changed >in the new version. This is what I would propose: > >./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup > >Thanks, >Kira > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri May 2 13:55:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:55:27 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Whichever has the best AED score I believe, but you can add gene_id= to the header of each fasta file to ensure MAKER doesn't try and cluster unrelated transcripts into a single gene. Then the transcript name and gene name will be guaranteed to match up. --Carson From: Shaun Jackman Date: Wednesday, April 30, 2014 at 5:25 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] est_forward and conflicting names Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 14:40:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 12:40:42 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are conventionally named according to the amino acid and anticodon, such as trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names with that convention? On 6 March 2014 12:58, Carson Holt wrote: Yes. I?ll fix the naming. > > Thanks, > Carson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 14:50:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 13:50:23 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should already be fixed in the current 2.31.3 download. I'll also send you the subversion credentials in a separate e-mail. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 1:40 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? On 6 March 2014 12:58, Carson Holt wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 15:00:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 13:00:22 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also > send you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes > the naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified > genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the > names with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > > Yes. I?ll fix the naming. >> >> Thanks, >> Carson >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 15:14:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 14:14:11 -0600 Subject: [maker-devel] Mapping gene names Message-ID: I need to fix that last update tag. I did a point release, because there were a couple of very minor fixes that didn't justify a full release (tRNA naming and a fasta_merge bug for tRNAs - I think three lines total of code). There won't be another major version release for a while because we're working on MAKER-EVM which will be version 3.0 (joint project for full MAKER integration with EVM). So just point releases on 2.31 (which will be the very last version of MAKER2). I'll fix the static link and add an new one for 2.31.3. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 2:00 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31 reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also send > you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes the > naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? >> >> The integration of MAKER-P with tRNAscan is very useful. The identified genes >> are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are >> conventionally named according to the amino acid and anticodon, such as >> trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names >> with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > >> Yes. I?ll fix the naming. >> >> Thanks, >> Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From cynsb1987 at gmail.com Sun May 4 20:58:33 2014 From: cynsb1987 at gmail.com (hueytyng) Date: Mon, 5 May 2014 11:58:33 +1000 Subject: [maker-devel] Non-unique top level ID Message-ID: Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 09:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 08:18:18 -0600 Subject: [maker-devel] Non-unique top level ID In-Reply-To: References: Message-ID: If you use gff3_merge with the -l flag, then it will check for non-unique ID's and give new IDs to make them unique. Also in general it is better just to use the cufflinks results and exclude tophat results as they tend to be very noisy and decrease the quality of the final models overall. Thanks, Carson From: hueytyng Date: Sunday, May 4, 2014 at 7:58 PM To: Subject: [maker-devel] Non-unique top level ID Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 11:48:30 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 11:48:30 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue Message-ID: Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 11:53:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 10:53:09 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of MAKER (2.31.3 is current). ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ --Carson From: Dave Messina Date: Monday, May 5, 2014 at 10:48 AM To: Subject: [maker-devel] MAKER / RepeatRunner configuration issue Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 13:05:54 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 13:05:54 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old > version of MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER > File Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 14:32:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:32:01 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: I can't reproduce your issue, so it is probably something about your system or environment. 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. 2. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. 3. Is your current working directory full. 4. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. 5. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). 6. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 14:44:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:44:11 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Could you give me the full output of this command --> df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A I'm really mostly interested in the mount information. Some non-traditional network storage implementations can induce odd behaviors (for example by not supporting operations like hard links, etc.). --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 14:53:58 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 14:53:58 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A > Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Is you /tmp directory full (or whatever you have $TMPDIR > environmental variable is set to). Use 'df -h /tmp' to check. > > $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Are you running in a directory on an NFS drive? Is it true NFS or > is it something like FUSE. > > Same error on true NFS or on local disk. > > 1. Is your current working directory full. > > No. > > 1. Are you setting TMP= in the control files to either an NFS mounted > location or an in memory mounted location. Same issue if you are setting > the system's TMPDIR environmental variable to one of these. > > I tried setting it to /tmp just to be sure (no difference). > > 1. Is your default /tmp directory in fact locally mounted (some > clusters set this to in memory scratch). > > Yes. > > 1. Even though you already checked, humor me and run this exact > command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx > -version > > $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 15:00:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 14:00:57 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 17:34:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 16:34:14 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Mon May 5 19:09:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Mon, 5 May 2014 17:09:41 -0700 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Message-ID: Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Tue May 6 00:06:25 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 6 May 2014 05:06:25 +0000 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365FB90A5@mxb2.hg.genetics.utah.edu> you are the Man, Carson. --mark ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, May 05, 2014 4:34 PM To: Dave Messina Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt > Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina > Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt > wrote: df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. Same error on true NFS or on local disk. 1. Is your current working directory full. No. 1. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). 1. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). Yes. 1. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave From kdelmore at zoology.ubc.ca Mon May 5 23:36:41 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Mon, 5 May 2014 21:36:41 -0700 Subject: [maker-devel] iprscan and ipr_update_gff Message-ID: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Hi, I have a question about the interproscan scripts available with maker. I'm following the recommendations posted by Carson in Aug 2011 to incorporate results from iprscan. I'm getting quite a few warning messages with ipr_update_gff; they're all the same and suggest that there's no value for $name. When I look through the updated gff, however, the dbxrefs have been added. Is this something I should be worried about? I'm using iprscan version 5 and actually get some warning messages there as well but again, the output looks alright. In addition, some of my fastas don't get these warnings in iprscan and they still give me the error with ipr_update_gff so I don't think that's the problem. I'm using proteins from UniProt. My commands and errors are below. I've also attached the first 20000 lines from my initial gff and raw file from iprscan. Thanks, I really appreciate your continued support. Kira ### commands for interproscan scripts available in maker iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace error after last step (just an example, a ton of similar lines): Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. ### commands for interproscan 5 interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > interpro_6.out 2>&1 interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml error after first step: 04/05/2014 19:22:09:269 25% completed 04/05/2014 21:27:36:305 50% completed 04/05/2014 21:32:34:236 75% completed 04/05/2014 21:38:01:379 90% completed 2014-05-04 21:50:22,761 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959/jobPIRSF-2.84 2014-05-04 21:50:22,908 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:253] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959 04/05/2014 21:50:23:380 100% done: InterProScan analyses completed error after second step: interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml 05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... 2014-05-05 21:04:00,603 [uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run completion, unable to delete temporary directory /home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_210353293_gsjh -------------- next part -------------- A non-text attachment was scrubbed... Name: 6.maker.proteins.fasta.xml.raw Type: application/octet-stream Size: 1098374 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6_first20000.gff Type: application/octet-stream Size: 2880872 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 6 09:31:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:31:55 -0600 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 In-Reply-To: References: Message-ID: Nothing in the scoring or gene selection has changed. Changes are: Fix trnascan naming so codon is included in name Fix fgenesh parsing when used with correct_est_fusion Fix final ID bug when '/' character used in GFF3 input ID. Fix a start codon issue that could come up under when the right set of parameters were used (primarily correct_est_fusion and protein2genome). If you can provide both gff3 outputs form comparison, I could probably tell you why. Set up both runs to make sure that settings are indeed identical. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Monday, May 5, 2014 at 6:09 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 6 09:57:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:57:04 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: You have entries in your interproscan output that aren't in your GFF3. Is your GFF3 file truncated? --Carson On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" wrote: >Hi, I have a question about the interproscan scripts available with maker. > >I'm following the recommendations posted by Carson in Aug 2011 to >incorporate results from iprscan. I'm getting quite a few warning messages >with ipr_update_gff; they're all the same and suggest that there's no >value for $name. When I look through the updated gff, however, the dbxrefs >have been added. Is this something I should be worried about? > >I'm using iprscan version 5 and actually get some warning messages there >as well but again, the output looks alright. In addition, some of my >fastas don't get these warnings in iprscan and they still give me the >error with ipr_update_gff so I don't think that's the problem. I'm using >proteins from UniProt. My commands and errors are below. I've also >attached the first 20000 lines from my initial gff and raw file from >iprscan. > >Thanks, I really appreciate your continued support. >Kira > >### > >commands for interproscan scripts available in maker >iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace > >error after last step (just an example, a ton of similar lines): >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. > > >### > >commands for interproscan 5 >interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >interpro_6.out 2>&1 >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml > >error after first step: >04/05/2014 19:22:09:269 25% completed >04/05/2014 21:27:36:305 50% completed >04/05/2014 21:32:34:236 75% completed >04/05/2014 21:38:01:379 90% completed >2014-05-04 21:50:22,761 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >248] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959/jobPIRSF-2.84 >2014-05-04 21:50:22,908 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >253] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959 >04/05/2014 21:50:23:380 100% done: InterProScan analyses completed > >error after second step: >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >2014-05-05 21:04:00,603 >[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >completion, unable to delete temporary directory >/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >10353293_gsjh_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From kdelmore at zoology.ubc.ca Tue May 6 10:06:56 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:06:56 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Thanks for your reply. I have not truncated the gff3. I'm using files from the datastore that were written at the same time so I'm not sure how that would happen. I split my multifasta before running it through maker and have not merged the gff or protein.fasta for iprscan. That wouldn't be the problem would it? > You have entries in your interproscan output that aren't in your GFF3. Is > your GFF3 file truncated? > > --Carson > > > On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" > wrote: > >>Hi, I have a question about the interproscan scripts available with >> maker. >> >>I'm following the recommendations posted by Carson in Aug 2011 to >>incorporate results from iprscan. I'm getting quite a few warning >> messages >>with ipr_update_gff; they're all the same and suggest that there's no >>value for $name. When I look through the updated gff, however, the >> dbxrefs >>have been added. Is this something I should be worried about? >> >>I'm using iprscan version 5 and actually get some warning messages there >>as well but again, the output looks alright. In addition, some of my >>fastas don't get these warnings in iprscan and they still give me the >>error with ipr_update_gff so I don't think that's the problem. I'm using >>proteins from UniProt. My commands and errors are below. I've also >>attached the first 20000 lines from my initial gff and raw file from >>iprscan. >> >>Thanks, I really appreciate your continued support. >>Kira >> >>### >> >>commands for interproscan scripts available in maker >>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >> -inplace >> >>error after last step (just an example, a ton of similar lines): >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. >> >> >>### >> >>commands for interproscan 5 >>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>interpro_6.out 2>&1 >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >> >>error after first step: >>04/05/2014 19:22:09:269 25% completed >>04/05/2014 21:27:36:305 50% completed >>04/05/2014 21:32:34:236 75% completed >>04/05/2014 21:38:01:379 90% completed >>2014-05-04 21:50:22,761 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>248] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959/jobPIRSF-2.84 >>2014-05-04 21:50:22,908 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>253] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959 >>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >> >>error after second step: >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>2014-05-05 21:04:00,603 >>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>completion, unable to delete temporary directory >>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >>10353293_gsjh_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue May 6 10:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:09:13 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: The file you sent was missing the ##FASTA entry and all sequence at the bottom for example. Is that the way it is in the datastore? --Carson On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Thanks for your reply. I have not truncated the gff3. I'm using files from >the datastore that were written at the same time so I'm not sure how that >would happen. I split my multifasta before running it through maker and >have not merged the gff or protein.fasta for iprscan. That wouldn't be the >problem would it? > >> You have entries in your interproscan output that aren't in your GFF3. >>Is >> your GFF3 file truncated? >> >> --Carson >> >> >> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Hi, I have a question about the interproscan scripts available with >>> maker. >>> >>>I'm following the recommendations posted by Carson in Aug 2011 to >>>incorporate results from iprscan. I'm getting quite a few warning >>> messages >>>with ipr_update_gff; they're all the same and suggest that there's no >>>value for $name. When I look through the updated gff, however, the >>> dbxrefs >>>have been added. Is this something I should be worried about? >>> >>>I'm using iprscan version 5 and actually get some warning messages there >>>as well but again, the output looks alright. In addition, some of my >>>fastas don't get these warnings in iprscan and they still give me the >>>error with ipr_update_gff so I don't think that's the problem. I'm using >>>proteins from UniProt. My commands and errors are below. I've also >>>attached the first 20000 lines from my initial gff and raw file from >>>iprscan. >>> >>>Thanks, I really appreciate your continued support. >>>Kira >>> >>>### >>> >>>commands for interproscan scripts available in maker >>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>> -inplace >>> >>>error after last step (just an example, a ton of similar lines): >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15242. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15353. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15674. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15776. >>> >>> >>>### >>> >>>commands for interproscan 5 >>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>>interpro_6.out 2>&1 >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>> >>>error after first step: >>>04/05/2014 19:22:09:269 25% completed >>>04/05/2014 21:27:36:305 50% completed >>>04/05/2014 21:32:34:236 75% completed >>>04/05/2014 21:38:01:379 90% completed >>>2014-05-04 21:50:22,761 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>248] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959/jobPIRSF-2.84 >>>2014-05-04 21:50:22,908 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>253] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959 >>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>> >>>error after second step: >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>2014-05-05 21:04:00,603 >>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>completion, unable to delete temporary directory >>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>_2 >>>10353293_gsjh_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > From kdelmore at zoology.ubc.ca Tue May 6 10:26:07 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:26:07 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> I just printed the first 20000 lines of the gff to send to you because it was too large to send through email. I've included a dropbox link to the full file below. I've also included a link to the final gff with dbx refs; as I mentioned, it does seem to add them even with the error. If I run ipr_update_gff twice, I get the warnings on the first run but not on the second. Does that help diagnose the problem? The only other red flag I've encountered with maker was in including external gff3 from geneid and sgp2. These gff3s failed validation at the website suggested the the README file, with the warning message "cds: non-unique id" for all cds, but maker didn't give me a warning and they seem to be incorporated into the annotation fine. original gff https://www.dropbox.com/s/nimoh605jdk9myx/6.gff final gff https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta Thanks again for getting back to me. > The file you sent was missing the ##FASTA entry and all sequence at the > bottom for example. Is that the way it is in the datastore? > > --Carson > > > On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" > wrote: > >>Thanks for your reply. I have not truncated the gff3. I'm using files >> from >>the datastore that were written at the same time so I'm not sure how that >>would happen. I split my multifasta before running it through maker and >>have not merged the gff or protein.fasta for iprscan. That wouldn't be >> the >>problem would it? >> >>> You have entries in your interproscan output that aren't in your GFF3. >>>Is >>> your GFF3 file truncated? >>> >>> --Carson >>> >>> >>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>> >>> wrote: >>> >>>>Hi, I have a question about the interproscan scripts available with >>>> maker. >>>> >>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>incorporate results from iprscan. I'm getting quite a few warning >>>> messages >>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>value for $name. When I look through the updated gff, however, the >>>> dbxrefs >>>>have been added. Is this something I should be worried about? >>>> >>>>I'm using iprscan version 5 and actually get some warning messages >>>> there >>>>as well but again, the output looks alright. In addition, some of my >>>>fastas don't get these warnings in iprscan and they still give me the >>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>> using >>>>proteins from UniProt. My commands and errors are below. I've also >>>>attached the first 20000 lines from my initial gff and raw file from >>>>iprscan. >>>> >>>>Thanks, I really appreciate your continued support. >>>>Kira >>>> >>>>### >>>> >>>>commands for interproscan scripts available in maker >>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>> -inplace >>>> >>>>error after last step (just an example, a ton of similar lines): >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15242. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15353. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15674. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15776. >>>> >>>> >>>>### >>>> >>>>commands for interproscan 5 >>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>> > >>>>interpro_6.out 2>&1 >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>> >>>>error after first step: >>>>04/05/2014 19:22:09:269 25% completed >>>>04/05/2014 21:27:36:305 50% completed >>>>04/05/2014 21:32:34:236 75% completed >>>>04/05/2014 21:38:01:379 90% completed >>>>2014-05-04 21:50:22,761 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>248] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959/jobPIRSF-2.84 >>>>2014-05-04 21:50:22,908 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>253] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959 >>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>> >>>>error after second step: >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>2014-05-05 21:04:00,603 >>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>completion, unable to delete temporary directory >>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>>_2 >>>>10353293_gsjh_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > > From carsonhh at gmail.com Tue May 6 10:47:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:47:23 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Ok. With the full file I can see what what was causing the message. It is a parsing bug that was happening in a few cases, and I've now fixed it. But you can ignore it, because it has no effect on the output. It would only be an issue if the ID= and Name= tags were different in the GFF3 for the gene feature lines (which is never be true for MAKER's output). It was correctly parsing the 'mRNA' Name and ID tags, but was sometimes having issue with the Name= tags for the 'gene' lines (but because they are redundant with ID= tag, the script still finds what it needs to add the Dbxref= tags). --Carson On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" wrote: >I just printed the first 20000 lines of the gff to send to you because it >was too large to send through email. I've included a dropbox link to the >full file below. I've also included a link to the final gff with dbx refs; >as I mentioned, it does seem to add them even with the error. If I run >ipr_update_gff twice, I get the warnings on the first run but not on the >second. Does that help diagnose the problem? > >The only other red flag I've encountered with maker was in including >external gff3 from geneid and sgp2. These gff3s failed validation at the >website suggested the the README file, with the warning message "cds: >non-unique id" for all cds, but maker didn't give me a warning and they >seem to be incorporated into the annotation fine. > >original gff >https://www.dropbox.com/s/nimoh605jdk9myx/6.gff > >final gff >https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta > >Thanks again for getting back to me. > >> The file you sent was missing the ##FASTA entry and all sequence at the >> bottom for example. Is that the way it is in the datastore? >> >> --Carson >> >> >> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Thanks for your reply. I have not truncated the gff3. I'm using files >>> from >>>the datastore that were written at the same time so I'm not sure how >>>that >>>would happen. I split my multifasta before running it through maker and >>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>> the >>>problem would it? >>> >>>> You have entries in your interproscan output that aren't in your GFF3. >>>>Is >>>> your GFF3 file truncated? >>>> >>>> --Carson >>>> >>>> >>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>> >>>> wrote: >>>> >>>>>Hi, I have a question about the interproscan scripts available with >>>>> maker. >>>>> >>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>> messages >>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>value for $name. When I look through the updated gff, however, the >>>>> dbxrefs >>>>>have been added. Is this something I should be worried about? >>>>> >>>>>I'm using iprscan version 5 and actually get some warning messages >>>>> there >>>>>as well but again, the output looks alright. In addition, some of my >>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>> using >>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>iprscan. >>>>> >>>>>Thanks, I really appreciate your continued support. >>>>>Kira >>>>> >>>>>### >>>>> >>>>>commands for interproscan scripts available in maker >>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>> -inplace >>>>> >>>>>error after last step (just an example, a ton of similar lines): >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15242. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15353. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15674. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15776. >>>>> >>>>> >>>>>### >>>>> >>>>>commands for interproscan 5 >>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>>> > >>>>>interpro_6.out 2>&1 >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>> >>>>>error after first step: >>>>>04/05/2014 19:22:09:269 25% completed >>>>>04/05/2014 21:27:36:305 50% completed >>>>>04/05/2014 21:32:34:236 75% completed >>>>>04/05/2014 21:38:01:379 90% completed >>>>>2014-05-04 21:50:22,761 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>248] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959/jobPIRSF-2.84 >>>>>2014-05-04 21:50:22,908 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>253] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959 >>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>> >>>>>error after second step: >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>2014-05-05 21:04:00,603 >>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>completion, unable to delete temporary directory >>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_201405 >>>>>05 >>>>>_2 >>>>>10353293_gsjh_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>>> >>> >>> >> >> >> > > From carsonhh at gmail.com Tue May 6 10:54:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:54:41 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Actually looking a little closer, it wouldn't even matter if the ID= and Name= tags were different for the 'gene', because interproscan gives the results for the transcripts (mRNA) and not the gene. So Dbxref still gets populated correctly reguardless. --Carson On 5/6/14, 9:47 AM, "Carson Holt" wrote: >Ok. With the full file I can see what what was causing the message. It is >a parsing bug that was happening in a few cases, and I've now fixed it. >But you can ignore it, because it has no effect on the output. > >It would only be an issue if the ID= and Name= tags were different in the >GFF3 for the gene feature lines (which is never be true for MAKER's >output). It was correctly parsing the 'mRNA' Name and ID tags, but was >sometimes having issue with the Name= tags for the 'gene' lines (but >because they are redundant with ID= tag, the script still finds what it >needs to add the Dbxref= tags). > >--Carson > > >On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" >wrote: > >>I just printed the first 20000 lines of the gff to send to you because it >>was too large to send through email. I've included a dropbox link to the >>full file below. I've also included a link to the final gff with dbx >>refs; >>as I mentioned, it does seem to add them even with the error. If I run >>ipr_update_gff twice, I get the warnings on the first run but not on the >>second. Does that help diagnose the problem? >> >>The only other red flag I've encountered with maker was in including >>external gff3 from geneid and sgp2. These gff3s failed validation at the >>website suggested the the README file, with the warning message "cds: >>non-unique id" for all cds, but maker didn't give me a warning and they >>seem to be incorporated into the annotation fine. >> >>original gff >>https://www.dropbox.com/s/nimoh605jdk9myx/6.gff >> >>final gff >>https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta >> >>Thanks again for getting back to me. >> >>> The file you sent was missing the ##FASTA entry and all sequence at the >>> bottom for example. Is that the way it is in the datastore? >>> >>> --Carson >>> >>> >>> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >>> wrote: >>> >>>>Thanks for your reply. I have not truncated the gff3. I'm using files >>>> from >>>>the datastore that were written at the same time so I'm not sure how >>>>that >>>>would happen. I split my multifasta before running it through maker and >>>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>>> the >>>>problem would it? >>>> >>>>> You have entries in your interproscan output that aren't in your >>>>>GFF3. >>>>>Is >>>>> your GFF3 file truncated? >>>>> >>>>> --Carson >>>>> >>>>> >>>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>>> >>>>> wrote: >>>>> >>>>>>Hi, I have a question about the interproscan scripts available with >>>>>> maker. >>>>>> >>>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>>> messages >>>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>>value for $name. When I look through the updated gff, however, the >>>>>> dbxrefs >>>>>>have been added. Is this something I should be worried about? >>>>>> >>>>>>I'm using iprscan version 5 and actually get some warning messages >>>>>> there >>>>>>as well but again, the output looks alright. In addition, some of my >>>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>>> using >>>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>>iprscan. >>>>>> >>>>>>Thanks, I really appreciate your continued support. >>>>>>Kira >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan scripts available in maker >>>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>>> -inplace >>>>>> >>>>>>error after last step (just an example, a ton of similar lines): >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15242. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15353. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15674. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15776. >>>>>> >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan 5 >>>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup >>>>>>\ >>>>>> > >>>>>>interpro_6.out 2>&1 >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>> >>>>>>error after first step: >>>>>>04/05/2014 19:22:09:269 25% completed >>>>>>04/05/2014 21:27:36:305 50% completed >>>>>>04/05/2014 21:32:34:236 75% completed >>>>>>04/05/2014 21:38:01:379 90% completed >>>>>>2014-05-04 21:50:22,761 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>248] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959/jobPIRSF-2.84 >>>>>>2014-05-04 21:50:22,908 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>253] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959 >>>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>>> >>>>>>error after second step: >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>>2014-05-05 21:04:00,603 >>>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>>completion, unable to delete temporary directory >>>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140 >>>>>>5 >>>>>>05 >>>>>>_2 >>>>>>10353293_gsjh_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > From sjackman at gmail.com Thu May 8 17:26:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:26:34 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun *http://sjackman.ca * On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to > the header of each fasta file to ensure MAKER doesn't try and cluster > unrelated transcripts into a single gene. Then the transcript name and > gene name will be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of > these reference sequences have sensible and consistent gene names, and so > I?m using est_forward to retain the gene names. This workflow is working > well for me. Some of the genes pulled in from GenBank have less useful > names like orf1234 or other numeric IDs. When multiple evidence sequences > map to the same location, how does est_forward choose which name to use? > If it?s chosen arbitrarily, could it be possible to choose the most common > name instead? > > Thanks, > Shaun > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 17:33:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:33:36 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: When moving transcripts onto a new assembly, you may have multiple transcripts of the same gene. Because your transcript name should be your fasta ID there is no way for MAKER to know that they go together when moving the models forward, so you can use the gene= option to make MAKER aware that these belong to the same genes. They will be grouped and you recover all splice forms as a group. Example: >SMEDT_00004 gene=dpp AAAAAAA >SMEDT_00005 gene=dpp AAAAAAA --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:26 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun http://sjackman.ca On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to the > header of each fasta file to ensure MAKER doesn't try and cluster unrelated > transcripts into a single gene. Then the transcript name and gene name will > be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of these > reference sequences have sensible and consistent gene names, and so I?m using > est_forward to retain the gene names. This workflow is working well for me. > Some of the genes pulled in from GenBank have less useful names like orf1234 > or other numeric IDs. When multiple evidence sequences map to the same > location, how does est_forward choose which name to use? If it?s chosen > arbitrarily, could it be possible to choose the most common name instead? > > Thanks, > Shaun > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu May 8 17:41:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:41:41 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun *http://sjackman.ca * On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple > transcripts of the same gene. Because your transcript name should be your > fasta ID there is no way for MAKER to know that they go together when > moving the models forward, so you can use the gene= option to make MAKER > aware that these belong to the same genes. They will be grouped and you > recover all splice forms as a group. > > Example: > > >SMEDT_00004 gene=dpp > AAAAAAA > > >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the > header of the FASTA file? I?m not clear on what you mean by this. In the > FASTA header, what portion is the transcript name, and what portion is the > gene name? > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 2 May 2014 11:55, Carson Holt wrote: > >> Whichever has the best AED score I believe, but you can add gene_id= to >> the header of each fasta file to ensure MAKER doesn't try and cluster >> unrelated transcripts into a single gene. Then the transcript name and >> gene name will be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which >> I?m using with est and protein to annotate a plant mitochondrion. Most >> of these reference sequences have sensible and consistent gene names, and >> so I?m using est_forward to retain the gene names. This workflow is >> working well for me. Some of the genes pulled in from GenBank have less >> useful names like orf1234 or other numeric IDs. When multiple evidence >> sequences map to the same location, how does est_forward choose which >> name to use? If it?s chosen arbitrarily, could it be possible to choose the >> most common name instead? >> >> Thanks, >> Shaun >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 17:43:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:43:40 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Only if you were to remove the brackets around gene=. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:41 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun http://sjackman.ca On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple transcripts > of the same gene. Because your transcript name should be your fasta ID there > is no way for MAKER to know that they go together when moving the models > forward, so you can use the gene= option to make MAKER aware that these belong > to the same genes. They will be grouped and you recover all splice forms as a > group. > > Example: > >> >SMEDT_00004 gene=dpp > AAAAAAA > >> >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the header of > the FASTA file? I?m not clear on what you mean by this. In the FASTA header, > what portion is the transcript name, and what portion is the gene name? > > Cheers, > Shaun > > > http://sjackman.ca > > > On 2 May 2014 11:55, Carson Holt wrote: >> Whichever has the best AED score I believe, but you can add gene_id= to the >> header of each fasta file to ensure MAKER doesn't try and cluster unrelated >> transcripts into a single gene. Then the transcript name and gene name will >> be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m >> using with est and protein to annotate a plant mitochondrion. Most of these >> reference sequences have sensible and consistent gene names, and so I?m using >> est_forward to retain the gene names. This workflow is working well for me. >> Some of the genes pulled in from GenBank have less useful names like orf1234 >> or other numeric IDs. When multiple evidence sequences map to the same >> location, how does est_forward choose which name to use? If it?s chosen >> arbitrarily, could it be possible to choose the most common name instead? >> >> Thanks, >> Shaun >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 16:07:52 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:07:52 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. Cheers, Shaun On 4 March 2014 18:33, Carson Holt wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is > extremely messy (non-trivial problem in most organisms with high false > positive rate), so MAKER for the most part doesn?t even try to do that. It > focuses only on the coding genes. You can now use tRNAscan and snoscan in > the newest version for some non-coding RNA support (those features were > only added a couple of months ago). So just like other prediction tools > (snap, augustus etc.), the primary focus has always been the coding genes. > We?ve only started adding non-coding RNA support recently for iPlant, so > it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks > for the tip. > > The rRNA genes that are found with est2genome have the feature type set to > *mRNA* and have corresponding *five_prime_UTR*, *CDS* and > *three_prime_UTR* features. Ideally the feature type would be set to > *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. > Is that a feature that you would be interested in adding to MAKER? The rRNA > gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is > standard, so determining the appropriate type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > On 27 February 2014 17:13, Carson Holt wrote: > >> Set single_exon=1, and the minimum size to a smaller value. I think it's >> set to 250 right now. Also est2genome is looking for ORF, so if there is >> none (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >> these hits? >> >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> >> Cheers, >> Shaun >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >> >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> Cheers, >>> Shaun >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>> wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before >>> passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>> What you can do is run it once with just est_forward=1 and >>> est2genome/protein2genome set to 1. Then take those results, pass them in >>> as model_gff and use the map_forward option to then filter the results >>> based on mRNA score and that would copy names onto new gene under the >>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>> separate tool that will map genes onto new assemblies (but under the hood >>> the tool will just be calling MAKER with certain parameters restricted). I >>> do this because if people commonly use it mixed with things like SNAP I can >>> start to get some very weird behaviors. >>> >>> Thanks, >>> Carson >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 3:04 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> It seems that this could be a very useful option in those cases where >>> you have firm a priori knowledge of the placement of ESTs. However, while >>> trying it I note that est_forward implies that the est2genome predictor is >>> turned on, implicitly. Is this necessary for this to work? I?m after the >>> behavior you describe below where exonerate is made to try really hard >>> within a limited region to align an est, but I would not like maker to >>> produce est2genome predictions. >>> >>> In general, I think this maker_coor and est_forward is a feature set >>> that is worthy to be promoted into a documented feature. >>> >>> THanks, >>> Mikael >>> >>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>> >>> It will still work without est_forward. It just works a little >>> differently. Keep in mind this was a hidden feature I used to find >>> stubborn or hard to find missing genes after reassembly of a genome. >>> >>> If est_forward is provided, MAKER will parse the database to look for >>> the maker_coor tags early in the pipeline. Then it will create a list of >>> locations to search, and it will search them even if there are no BLAST >>> results to seed the search (normally MAKER gets a BLAST result first and >>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>> look for a match using all of chr1 as the input to exonerate even when >>> BLAST finds nothing (this is a very very slow search, but can help pick up >>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>> gives exonerate looser matching parameters (i.e. allows for single base >>> pair introns perhaps caused by assembly errors). The logic here is that >>> given the fact that I already told MAKER that with some degree of >>> confidence I expect sequence A to map to to location X, it will try its >>> hardest to make it match. >>> >>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>> at line 1563, but only after a BLAST alignment has already seeded it to the >>> region (that BLAST result has the information in its description >>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>> In addition any BLAST seeds that overlap maker_coor will get the search >>> space for alignment polishing adjusted to match maker_coor exactly. Also >>> match parameters for exonerate will not be relaxed as they were with >>> est_forward. >>> >>> As you can see the behavior, is slightly different (because it?s an >>> accidental feature). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 6:37 AM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> That might be a useful and time saving accidental feature. But, reading >>> the code, it seems that I need to supply maker_coor but not gene_id, as >>> well as the configuration option est_forward for this to work. Any >>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>> right? >>> >>> Mikael >>> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>> mikael.durling at slu.se> wrote: >>> >>> Can this use of maker_coor be used only to hint about the placement of >>> the ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>> There is a way. It?s not a standard option and it?s undocumented, but >>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>> that. The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add >>> tags to your fasta headers, those can be used to guide the mapping and >>> naming. For example, gene_id= will ensure different isoforms >>> that share a common gene_id get clustered into the same gene, >>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using >>> blast alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, >>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>> well. Is it possible to map the names of the genes from the related species >>> to my annotation? I see the *map_forward* option, which applies to the >>> *model_gff* parameter. Is there a similar option for *est* and *protein* >>> ? >>> >>> *maker_opts.ctl* >>> >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 16:18:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 15:18:52 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Thanks. Looks interesting. Also since output is already GFF3, you could probably just use it with gff passthrough. It doesn't appear to support eukaryotes though. --Carson Sent from my iPhone > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnap to predict rRNA. > > Cheers, > Shaun > > >> On 4 March 2014 18:33, Carson Holt wrote: >> Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>>> >>>>>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>>>>> >>>>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>>>>> >>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>>>>> >>>>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Mikael Brandstr?m Durling >>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>>> To: Carson Holt >>>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>>>>> >>>>>>>> Mikael >>>>>>>> >>>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>>>> >>>>>>>>> Yes. That should work as well as an accidental feature. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>>>>>> >>>>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mikael >>>>>>>>>> >>>>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>>>>> >>>>>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>>>>> >>>>>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>>>>> >>>>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Shaun Jackman >>>>>>>>>>> Reply-To: Shaun Jackman >>>>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>>>>> To: >>>>>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>>>>> >>>>>>>>>>> maker_opts.ctl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> est=NC_123456.frn >>>>>>>>>>> protein=NC_123456.faa >>>>>>>>>>> est2genome=1 >>>>>>>>>>> protein2genome=1 >>>>>>>>>>> Thanks, >>>>>>>>>>> Shaun >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 16:25:21 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:25:21 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson, Torsten. It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). barrnap --help ? --kingdom [X] Kingdom: [b]acteria [a]rchaea (default 'bacteria') Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 19:06:31 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:06:31 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 19:19:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 18:19:43 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should be fixed in the current download? It came up on the mailing list a couple of weeks ago. I'll check. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Wednesday, May 14, 2014 at 6:06 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun http://sjackman.ca On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > >> Hi, Carson. Perhaps MAKER could integrate Barrnap >> to predict rRNA. >> >> Cheers, >> Shaun >> >> >> On 4 March 2014 18:33, Carson Holt wrote: >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were only >>> added a couple of months ago). So just like other prediction tools (snap, >>> augustus etc.), the primary focus has always been the coding genes. We?ve >>> only started adding non-coding RNA support recently for iPlant, so it?s >>> still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >>> the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set to >>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR >>> features. Ideally the feature type would be set to rRNA or tRNA as >>> appropriate, and would omit the UTR and CDS features. Is that a feature that >>> you would be interested in adding to MAKER? The rRNA gene names all start >>> with ?rrn? and the tRNA gene names with ?trn?, as is standard, so >>> determining the appropriate type should be straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>>> none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>>> Sorry, ignore my previous question. est_forward also carries forward the >>>>> names of protein evidence and works like a charm. Thank you! >>>>> >>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>>> these hits? >>>>> organism_type=prokaryotic >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> est_forward=1 >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>>> names from protein2genome? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Shaun >>>>>> >>>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>>> ) wrote: >>>>>> >>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>>> passing the gff3 to model_gff. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 19:22:37 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:22:37 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: I'm using MAKER 2.31.4. *http://sjackman.ca * On 14 May 2014 17:19, Carson Holt wrote: > That should be fixed in the current download? It came up on the mailing > list a couple of weeks ago. I'll check. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Wednesday, May 14, 2014 at 6:06 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I used other_gff to pass the following four-line GFF file of > Barrnap rRNA annotations through. The output of gff3_merge is quite > bizarre. See below. > > Input: > > ##gff-version 3 > 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA > > Output: > > ### > ARRAY(0x7feceb928780) > ### > ARRAY(0x7feceaa548a0) > ### > ARRAY(0x7feceeb01c60) > ### > ARRAY(0x7fecedf6fef8) > ### > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 14 May 2014 14:18, Carson Holt wrote: > >> Thanks. Looks interesting. Also since output is already GFF3, you could >> probably just use it with gff passthrough. It doesn't appear to support >> eukaryotes though. >> >> --Carson >> >> >> Sent from my iPhone >> >> On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: >> >> Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. >> >> Cheers, >> Shaun >> >> On 4 March 2014 18:33, Carson Holt wrote: >> >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were >>> only added a couple of months ago). So just like other prediction tools >>> (snap, augustus etc.), the primary focus has always been the coding genes. >>> We?ve only started adding non-coding RNA support recently for iPlant, so >>> it?s still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >>> for the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set >>> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >>> *three_prime_UTR* features. Ideally the feature type would be set to >>> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >>> features. Is that a feature that you would be interested in adding to >>> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >>> with ?trn?, as is standard, so determining the appropriate type should be >>> straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> >>>> Set single_exon=1, and the minimum size to a smaller value. I think >>>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>>> is none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward >>>> the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> >>>>> Is there a corresponding protein_forward=1 option to map forward >>>>> protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>>> wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). I >>>>> do this because if people commonly use it mixed with things like SNAP I can >>>>> start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where >>>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set >>>>> that is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little >>>>> differently. Keep in mind this was a hidden feature I used to find >>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for >>>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>>> locations to search, and it will search them even if there are no BLAST >>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>> look for a match using all of chr1 as the input to exonerate even when >>>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>>> gives exonerate looser matching parameters (i.e. allows for single base >>>>> pair introns perhaps caused by assembly errors). The logic here is that >>>>> given the fact that I already told MAKER that with some degree of >>>>> confidence I expect sequence A to map to to location X, it will try its >>>>> hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>>> region (that BLAST result has the information in its description >>>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>>> match parameters for exonerate will not be relaxed as they were with >>>>> est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an >>>>> accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, >>>>> reading the code, it seems that I need to supply maker_coor but not >>>>> gene_id, as well as the configuration option est_forward for this to work. >>>>> Any occurrences of maker_coor in GI.pm seems to be conditioned on >>>>> set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>> >>>>> Yes. That should work as well as an accidental feature. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>>> mikael.durling at slu.se> wrote: >>>>> >>>>> Can this use of maker_coor be used only to hint about the placement of >>>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>>> database of EST where I have a priori knowledge of their rough placement, >>>>> can this placement be given to maker without providing est_forward=1? >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>>> that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add >>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>> naming. For example, gene_id= will ensure different isoforms >>>>> that share a common gene_id get clustered into the same gene, >>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using >>>>> blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>>> well. Is it possible to map the names of the genes from the related species >>>>> to my annotation? I see the *map_forward* option, which applies to >>>>> the *model_gff* parameter. Is there a similar option for *est* and >>>>> *protein*? >>>>> >>>>> *maker_opts.ctl* >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> >>>>> Thanks, >>>>> Shaun >>>>> _______________________________________________ maker-devel mailing >>>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Wed May 14 18:33:55 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Thu, 15 May 2014 09:33:55 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Carson & Shaun > It doesn?t appear to support eukaryotes though. > > Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap > --help output seems to be out of date. > > Barrnap predicts the location of ribosomal RNA genes in genomes. It > supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria > (12S,16S) and eukaryotes (5S,5.8S,28S,18S). > > It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed May 14 22:23:03 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 May 2014 03:23:03 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> \o/ (now I can get rid of rnammer forever!) chris On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: Carson & Shaun It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- --Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA --Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajeet at gmail.com Thu May 15 12:36:00 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Thu, 15 May 2014 10:36:00 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: My brief test of barrnap suggests that it does not perform well on rRNA genes with introns such as those found in fungal mitochondria. Setting a lower threshold for --reject and --evalue helps, but is not enough. Looks like I cannot abandon rnammer for now. FYI - if you want to test barrnap with fungal mitochondria, use --kingdom bacteria because they have 23S and 16S unlike the human mitochondria. Sajeet On Wed, May 14, 2014 at 8:23 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > \o/ > > (now I can get rid of rnammer forever!) > > chris > > On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: > > Carson & Shaun > >> It doesn?t appear to support eukaryotes though. >> >> Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap >> --help output seems to be out of date. >> >> Barrnap predicts the location of ribosomal RNA genes in genomes. It >> supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria >> (12S,16S) and eukaryotes (5S,5.8S,28S,18S). >> >> It does support eukaryota and mitochondria, I just forgot to push the > documentation changes. This has been resolved now in the 0.4.2 release. > > --kingdom [X] Kingdom: euk arc bac mito (default 'bac') > > Next release 0.5 will have an 'accurate' mode which will fine tune the > predictions using cmalign glocal alignment. > > Thanks for your interest! > > -- > > *--Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash > University, AUSTRALIA* > > *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA > --http://www.bioinformatics.net.au/ * > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Thu May 15 14:00:47 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Thu, 15 May 2014 19:00:47 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: References: Message-ID: <1400180446764.46375@uga.edu> Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani ________________________________ From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt > Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Thu May 15 17:42:53 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Fri, 16 May 2014 08:42:53 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: Sajeet, Brief test of barrnap suggests that it does not perform well on rRNA genes > with introns such as those found in fungal mitochondria. Setting a lower > threshold for --reject and --evalue helps, but is not enough. > Looks like I cannot abandon rnammer for now. > FYI - if you want to test barrnap with fungal mitochondria, use --kingdom > bacteria because they have 23S and 16S unlike the human mitochondria. > This is good feedback. Paul Gardner also mentioned the intron issue. A "fungi" kingdom is clearly needed. I am not a mycologist so any assistance is coming up with a detailed rRNA architecture for eukaryotict phyla etc is something I have started but need assistance with. Adjustment of nhmmer alignment parameters could be done to improve the intronic rRNAs too. Here is what I have so far in terms of models: https://github.com/Victorian-Bioinformatics-Consortium/barrnap/blob/master/README.md#data-sources-for-hmm-models - do i need to split euk into protist / plant / animal / fungi? - should the current 'mito' be places inside the current 'euk' ? as mito data is likely to end up in assemblies, but keep separate for mito-only data? - plastids, chloroplasts, apicoplasts; i am not sure of the subtleties of these organelles' rRNA but am willing to learn. Thank you again for testing. Any help appreciated, -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 12:16:27 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 10:16:27 -0700 Subject: [maker-devel] Specify multiple files to rmlib Message-ID: Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 15:33:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:33:15 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: It could be done. I've made some changes to the subversion repository if you want to test it. You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 15:42:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:42:50 -0600 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1400180446764.46375@uga.edu> References: <1400180446764.46375@uga.edu> Message-ID: Upgrade to 2.31.5. Changes since 2.31.3 *a protein2genome issue that was introduced in 2.31.3 was fixed *fasta_merge failing with trnascan results issue was fixed *other_gff input resulting in ARRAY reference being printed was fixed. *naming of tRNA genes was improved to include amino acid identity --Carson From: Sivaranjani Namasivayam Date: Thursday, May 15, 2014 at 1:00 PM To: Carson Holt , Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] FW: protein2genome gene models Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 15:45:59 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 13:45:59 -0700 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? --? http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: It could be done. ?I've made some changes to the subversion repository if you want to test it. ?You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. ?It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 16:02:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 15:02:59 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: No. There are some implementation issues related to how repeats are processed and collapsed that may cause hidden bugs with the comma separated list, so it needs some more testing. The label is added to the output GFF3. For example protein=uniprot.fasta:uniprot, would cause the gff3 label to be protein2genome:uniprot rather than just protein2genome. Programs like GBrowse know how to use the labels to generate on/off check boxes to turn just some of your protein results on/off in a viewer rather than all of them. --Carson From: Shaun Jackman Date: Friday, May 16, 2014 at 2:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Specify multiple files to rmlib Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? -- http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: > It could be done. I've made some changes to the subversion repository if you > want to test it. You should also be able to use labels just as you can with > other comma separated lists in MAKER using ':' to separate the label. > > Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label > > I've also found the other_gff issue. It was fixed in the subversion > repository but not in the release package I made the other day, so I've > updated the release to 2.31.5. > > --Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 16, 2014 at 11:16 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Specify multiple files to rmlib > > Hi, Carson. Some options of maker accept multiple files as a comma separated > list, but rmlib does not. Could it? > > Thanks! > Shaun > > P.S. Any update on the fix to other_gff? > > http://sjackman.ca > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue May 20 14:17:14 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 May 2014 19:17:14 +0000 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: <520E7E32-B4E2-486F-B730-F15683679440@illinois.edu> I found a problem with some tRNAscan output using MAKER 2.31.5. I had a full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs for. This was then run as follows, with the requisite error: -system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 Nested quantifiers in regex; marked by <-- HERE in m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at /home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, <$IN> line 3067590. The problematic lines: ---------------------------------------------- -system-specific-4.1$ grep "???" Zalbi.all.gff3 KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0 KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1 KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0 KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1 ---------------------------------------------- I managed to get it going by using the following modifications (regex quotemeta) in map_gff_ids (lines 107-112): for my $id (@map_ids) { # Only if the value (or the portion preceding # the first colon) is equal to the map key. next unless ($value eq $id || $value =~ /^\Q$id\E:/); $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ /\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); } I?m guessing there may be a similar problem with map_fasta_ids? chris From carsonhh at gmail.com Tue May 20 14:43:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 13:43:48 -0600 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: Thanks. trnascan support is new enough that there are these kinds of issues that we need to find and fix. MAKER tries to use the codon name supplied by trnascan, and it looks like the codon is 'Undet_???'. I don't know why that is. We currently don't do any filtering of trnascan results (i.e. we keep everything). This might be something that we really just want to be filtering out since it doesn't have a determinable codon? At the very least I should change the codon to NNN instead of ??? to correspond to the standard ambiguity nucleotides used in FASTA format. --Carson On 5/20/14, 1:17 PM, "Fields, Christopher J" wrote: >I found a problem with some tRNAscan output using MAKER 2.31.5. I had a >full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs >for. This was then run as follows, with the requisite error: > >-system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 >Nested quantifiers in regex; marked by <-- HERE in >m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at >/home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, ><$IN> line 3067590. > >The problematic lines: > >---------------------------------------------- >-system-specific-4.1$ grep "???" Zalbi.all.gff3 >KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene >-79.0 >KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet >_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 >KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1 >KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene >-72.0 >KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet >_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 >KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1 >---------------------------------------------- > >I managed to get it going by using the following modifications (regex >quotemeta) in map_gff_ids (lines 107-112): > > for my $id (@map_ids) { > # Only if the value (or the portion preceding > # the first colon) is equal to the map key. > next unless ($value eq $id || $value =~ /^\Q$id\E:/); > $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ >/\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); > } > >I?m guessing there may be a similar problem with map_fasta_ids? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From caigh02 at gmail.com Mon May 19 22:43:18 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Mon, 19 May 2014 23:43:18 -0400 Subject: [maker-devel] Maker exon number Message-ID: Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue May 20 15:34:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 May 2014 20:34:20 +0000 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Guohong, What version of MAKER are you running? Thanks, Daniel On May 19, 2014, at 9:43 PM, Guohong Cai wrote: > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 20 15:50:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 14:50:44 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I can do that. Just a note of caution though. The ID= attribute is not protected (it's just an identifier to relate things to one another for correct parentage). Downstream scripts that use or manipulate GFF3 files can change it (so relying on it to always be the same or even be informative is not guaranteed). --Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 20 19:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 18:52:34 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I've gone ahead and made the change in the devlopment version. It will probably be convenient in most cases, but it's important to note one caveat. Exon features are shared in GFF3 format. So if there are multiple isoforms that contain the same exon, there will only be a single exon line in the GFF3, but it will list several transcript IDs in it's Parent= attribute. What does that have to do with with the ID= attribute or exon order? Well it means that ID=exon:2 in the first transcript may be the second exon, but in another transcript ID=exon:2 may be the first exon or third exon, etc. This is because there is only a single line for a given exon and it gets shared by all the transcripts. So it will always have the same ID= tag, but will hold a different position in different isoforms (so it's ordinal value will not go along with the ID in those cases). But since most gene calls from MAKER will have only one isoform (default) it could still be convenient in those cases. Thanks, Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 08:14:40 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 08:14:40 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Daniel, I am using maker-2.31.5.---Guohong On Tue, May 20, 2014 at 3:34 PM, Daniel Ence wrote: > Hi Guohong, > > What version of MAKER are you running? > > Thanks, > Daniel > > > On May 19, 2014, at 9:43 PM, Guohong Cai > wrote: > > > Hi Carson, > > > > I am using MAKER to annotate a few small genomes. When looking through > the gff file, I notice that the exon numbers do not start from 0 or 1 for > each gene. Only the first gene in a scaffold start with exon 0. If the > first gene has 3 exons (0-2), then the second gene will start from exon 3 > (an example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > > > Guohong > > > > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 09:40:47 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 09:40:47 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Thanks a lot.---Guohong On Tue, May 20, 2014 at 7:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 22:16:52 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 23:16:52 -0400 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Carson, is the development version available for download? Only maker2.31.5 is available on Yandell Lab website.---Guohong On Tue, May 20, 2014 at 8:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 00:13:37 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Thu, 22 May 2014 22:13:37 -0700 Subject: [maker-devel] Alternative splicing options Message-ID: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri May 23 09:55:50 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 May 2014 14:55:50 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 10:07:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 09:07:26 -0600 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto wrote: > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know the > ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. I > could not find many discussions in this group, and most genome publications > using Maker are unclear about whether they annotated alternative transcrips, > so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function not > well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged here), > what is the best approach to take, considering I already obtained what I > considered is a solid set of gene models after four rounds of annotation? > Should I start over by turning on alt_splice, and training gene predictors > from those outputs? Or would it be appropriate to simply repeat my latest > round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 10:56:27 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 08:56:27 -0700 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hey guys, Great to hear!! I will be anxious to try it out. Thanks for your prompt help! Cheers, Felipe On Fri, May 23, 2014 at 8:07 AM, Carson Holt wrote: > I'd like to add that alternate splice forms will be generated off of the > mutually exclusive EST evidence, so how well it performs as well as whether > or not it can even generates other splice forms will depend entirely on the > quality of your EST evidence. > > --Carson > > > From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options > > Hi Felipe, > > The alternative splice option is full-developed and functional option in > MAKER. What it does is tell MAKER to consider gene models with mutually > exclusive evidence. For example, if there are two models at a locus and > evidence that supports one exon in one model and a different exon in > another model, both those models might make it into the final geneset. > > From the workflow you described, I think you'd have to redo only the > fourth and final round of MAKER annotation. As a general principle for > trying out new options on your annotations, I'd recommend choosing a big > scaffold, running it with alt_splice=1, and seeing how you like the > results. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: > > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know > the ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. > I could not find many discussions in this group, and most genome > publications using Maker are unclear about whether they annotated > alternative transcrips, so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function > not well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged > here), what is the best approach to take, considering I already obtained > what I considered is a solid set of gene models after four rounds of > annotation? Should I start over by turning on alt_splice, and training > gene predictors from those outputs? Or would it be appropriate to simply > repeat my latest round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 23 11:21:38 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 23 May 2014 16:21:38 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: <14271D2B-4D83-47C9-8661-682599E94E8F@illinois.edu> That is exactly what I have seen using this option; genes with very good transcriptome evidence (as one might expect)tend to have more isoforms. The problem we run into is not having a diverse enough transcriptome set to work with (ours tend to be tissue-specific unfortunately), so we have some genes giving more isoforms than others, but we don?t design the libraries so have no control over it. We are currently only using Trinity assemblies as input over using TopHat2/Cufflinks. chris On May 23, 2014, at 10:07 AM, Carson Holt > wrote: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 15:31:36 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 13:31:36 -0700 Subject: [maker-devel] gff3_merge on models only for SNAP training? Message-ID: Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 15:33:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 14:33:17 -0600 Subject: [maker-devel] gff3_merge on models only for SNAP training? In-Reply-To: References: Message-ID: Yes. It's ok. Non-genic feature lines will be ignored. --Carson From: Felipe Barreto Date: Friday, May 23, 2014 at 2:31 PM To: MAKER group Subject: [maker-devel] gff3_merge on models only for SNAP training? Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at imbim.uu.se Fri May 23 02:56:05 2014 From: jacques.dainat at imbim.uu.se (Jacques Dainat) Date: Fri, 23 May 2014 09:56:05 +0200 Subject: [maker-devel] Possible error in tRNA annotation by maker Message-ID: Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue May 27 03:12:07 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 27 May 2014 10:12:07 +0200 Subject: [maker-devel] Some questions regarding ab-initio training Message-ID: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Hi, I wanted to get some feedback regarding the training of ab-initio gene finders - it?s not strictly Maker related, but I suppose there are many people on this list that have encountered and solved this issue in one way or another. Specifically, I am trying to train Augustus (and possibly SNAP) for a plant genome. This has always been a very frustrating process for me, but while I have a better idea now how to do it, I still don?t get the sort of accuracy that I am hoping for. A quick run-through of my process; Evidence build with maker on level 1 and 2 proteins from Uniprot + Sanger-sequenced EST data Filtered for Models with an AED <= 0.3 Loaded that into WebApollo, together with an existing reference annotation and the evidence tracks Manually curated/selected 750 gene models using the following rules: - Must have start/stop codon - Most have as many exons as possible - Must agree with evidence - Must be >= 2kb part from other gene models (provided as flanking regions for augustus to train intergenic sequence) From these models, I created a GBK file, split it into 650 (train) and 100 (test) models and created a new profile using the documented procedure. But: While the naked ab-init models created through maker get a lot of genes ?sort of right?, I still see too many issues to be really satisfied. Problems include: - random exon calls which are not supported by any line of evidence (~1 per gene model, I would guess) - poor congruency with some gene models (especially ones not used for training/testing) Is there any best-practice guide on how to improve this? The Augustus website is unfortunately quite poor on detail? My impression so far is that ramping up the number of training models isn?t really doing too much beyond a certain point (tried 400, 500 and 750). Regards, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue May 27 10:25:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:25:39 -0600 Subject: [maker-devel] Some questions regarding ab-initio training In-Reply-To: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> References: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Message-ID: Extra exons can be required for predictors to make sense of a region (they do the best they can). This can be due to imperfect assemblies or repeats. For plants the repeat database is the the one thing that will most affect the annotation quality. You may need to spend some time building the best repeat library you can. The repeat library is the next most important thing next to training the predictor, because they confuse the predictor (sometimes a lot) causing it to behave oddly in those regions (because repeats do encode real protein and protein fragments). Also when running now with MAKER make sure to include the entire proteome of a related species and not just UniProt, and you will get better performance. Now that you have Augustus trained, using it inside of MAKER with an improved repeat library and additional protein evidence should give it the feedback that will allow it to perform better than it would with just naked ab initio prediction. Thanks, Carson On 5/27/14, 2:12 AM, "Marc H?ppner" wrote: >Hi, > >I wanted to get some feedback regarding the training of ab-initio gene >finders - it?s not strictly Maker related, but I suppose there are many >people on this list that have encountered and solved this issue in one >way or another. > >Specifically, I am trying to train Augustus (and possibly SNAP) for a >plant genome. This has always been a very frustrating process for me, but >while I have a better idea now how to do it, I still don?t get the sort >of accuracy that I am hoping for. A quick run-through of my process; > >Evidence build with maker on level 1 and 2 proteins from Uniprot + >Sanger-sequenced EST data > >Filtered for Models with an AED <= 0.3 > >Loaded that into WebApollo, together with an existing reference >annotation and the evidence tracks > >Manually curated/selected 750 gene models using the following rules: >- Must have start/stop codon >- Most have as many exons as possible >- Must agree with evidence >- Must be >= 2kb part from other gene models (provided as flanking >regions for augustus to train intergenic sequence) > >From these models, I created a GBK file, split it into 650 (train) and >100 (test) models and created a new profile using the documented >procedure. > >But: > >While the naked ab-init models created through maker get a lot of genes >?sort of right?, I still see too many issues to be really satisfied. >Problems include: > >- random exon calls which are not supported by any line of evidence (~1 >per gene model, I would guess) >- poor congruency with some gene models (especially ones not used for >training/testing) > >Is there any best-practice guide on how to improve this? The Augustus >website is unfortunately quite poor on detail? My impression so far is >that ramping up the number of training models isn?t really doing too much >beyond a certain point (tried 400, 500 and 750). > >Regards, > >Marc > > >Marc P. Hoeppner, PhD >Team Leader >BILS Genome Annotation Platform >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 27 10:26:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:26:25 -0600 Subject: [maker-devel] Possible error in tRNA annotation by maker In-Reply-To: References: Message-ID: Do you have a small test contig I could use to duplicate the error? That will make it easier to fix. Thanks, Carson From: Jacques Dainat Date: Friday, May 23, 2014 at 1:56 AM To: Subject: [maker-devel] Possible error in tRNA annotation by maker Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 02:28:14 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 09:28:14 +0200 Subject: [maker-devel] Problem with installation Message-ID: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 03:26:08 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 10:26:08 +0200 Subject: [maker-devel] General question Message-ID: I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.cantarel at gmail.com Wed May 28 06:11:18 2014 From: b.cantarel at gmail.com (Brandi Cantarel) Date: Wed, 28 May 2014 06:11:18 -0500 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Maker's predictions are improved with evidence. These proteins can be from uniprot (I recommend uniprot50) or from a closely related taxa. Maker uses comparisons to these proteins in its prediction. There is more detail on this in the paper. Sent from my iPhone > On May 28, 2014, at 3:26, Panos Ioannidis wrote: > > I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 06:29:43 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 13:29:43 +0200 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Thanks Brandi. On Wed, May 28, 2014 at 1:11 PM, Brandi Cantarel wrote: > Maker's predictions are improved with evidence. These proteins can be > from uniprot (I recommend uniprot50) or from a closely related taxa. > > Maker uses comparisons to these proteins in its prediction. There is more > detail on this in the paper. > > Sent from my iPhone > > On May 28, 2014, at 3:26, Panos Ioannidis > wrote: > > I'm going through the Maker tutorial and saw that among the input files > you give it, there's a fasta file with proteins (the protein=xxxparameter in the > maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. > proteins) and also annotates them. Does it only do annotation of already > predicted genes/proteins? But then, why is it using gene predictors like > Augustus, SNAP, etc? > > Thanks, > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 08:29:58 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 13:29:58 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 08:46:12 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 15:46:12 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence wrote: > Hi Panos, When you go to the src directory and type "./Build status", > what message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or > error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 09:03:33 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 14:03:33 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, So I just tried the commands that you used on my install of maker, and it took a surprisingly long time for the error messages to print. The test that we use in the tutorials (it seems to run faster than running maker with -h or with no options) is maker -CTL, which will create control files that you use to set the many options for maker. Try running ./maker -CTL and let me know whether it creates those files. I guess that it might take more or less time, depending on your machine. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 7:46 AM, Panos Ioannidis > wrote: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence > wrote: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 09:32:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 08:32:07 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis Cc: "" Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis wrote: > Hello Maker community, > > I just finished installing Maker and even though everything seems to be okay, > when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 11:13:05 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:13:05 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one > thing that happens when you first use a new module or script Is that the > interpreter follows the dependency tree validating that everything > executes/loads correctly. Since you installed a number of dependencies and > MAKER itself, the first time you launch MAKER Perl has to do this check on > the dependency tree. This only happens the first time, and after that Perl > remembers it already ran the check so the dependencies and MAKER will just > start from then on. Normally this proccess takes less than 30 seconds; > however, on some systems (especially clusters) there may a heavy IO burden > and this process can take a while. For example does it take a moment for > 'ls -al' to return in some directories rather than returning > instantaneously like it is supposed to? If it takes 3 seconds to return or > example, then each dependency check may take up to 3 seconds. If you just > installed a bunch of new perl modules then there may be a hundred or more > dependencies that may have to be validated for the first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 11:15:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:15:20 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 11:16:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:16:58 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: You may also want to look into if you need to reinstall perl on another drive. --Carson From: Carson Holt Date: Wednesday, May 28, 2014 at 10:15 AM To: Panos Ioannidis Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 11:25:04 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:25:04 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: "ls -al" is instantaneous in all directories... I'll try installing it on my workstation, although it's not possible to do annotation on my machine! And the machine I currently have installed it, is our server and I can't really make any big changes there. Anyway, I'll let you know how it goes. P On Wed, May 28, 2014 at 6:16 PM, Carson Holt wrote: > You may also want to look into if you need to reinstall perl on another > drive. > > --Carson > > > From: Carson Holt > Date: Wednesday, May 28, 2014 at 10:15 AM > To: Panos Ioannidis > > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 > seconds per query which is why you should do the 'ls -al' test), it can > take several minutes because it's an IO issue. > > --Carson > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 11:28:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:28:30 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Wed May 28 12:39:45 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 10:39:45 -0700 Subject: [maker-devel] Adding non-overlapping models to final set Message-ID: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 13:35:06 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 18:35:06 +0000 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: Message-ID: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 13:45:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 12:45:05 -0600 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: For convenience you can use the attached script to help pull out the match/match_part features you want from the GFF3 file (or you can pull them out yourself). Then do just like Daniel said by setting keep_preds=1 and giving the selected match/match_part features to pred_gf, and your current MAKER models to model_gff. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 12:35 PM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Adding non-overlapping models to final set Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto wrote: > Hi, all, > > I finished generating Maker gene models. Following suggestions here and from > publications, I used IPRscan on the set of non-ovelapping ab initio protein > models. This identified ~200 models with protein domains, and I would like to > add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to update > my final maker_genome.gff file to include these 200 models, without also > adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the 200 > new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional info > into the gff file. I want simply to generate an updated (and final) gene set > and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3236 bytes Desc: not available URL: From fbarreto at ucsd.edu Wed May 28 15:28:48 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 13:28:48 -0700 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: Awesome! Thanks for the tips and script. This should do the trick. Will come back if I get stuck. Felipe On Wed, May 28, 2014 at 11:45 AM, Carson Holt wrote: > For convenience you can use the attached script to help pull out the > match/match_part features you want from the GFF3 file (or you can pull them > out yourself). Then do just like Daniel said by setting keep_preds=1 and > giving the selected match/match_part features to pred_gf, and your current > MAKER models to model_gff. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 12:35 PM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Adding non-overlapping models to final set > > Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If > you don't care about getting the functional information from the IPRscan > report and into the gff file, then you just need to pull those predictions > out from all the ab-initio predictions that you don't care about and put > them in a fasta file. Then you put that file in for the "pred_gff" option > and set keep_preds=1. That will promote those predictions to full gene > models. Then you can merge with your other gff3 file. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: > > Hi, all, > > I finished generating Maker gene models. Following suggestions here and > from publications, I used IPRscan on the set of non-ovelapping ab initio > protein models. This identified ~200 models with protein domains, and I > would like to add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to > update my final maker_genome.gff file to include these 200 models, without > also adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the > 200 new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional > info into the gff file. I want simply to generate an updated (and final) > gene set and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Thu May 29 04:21:24 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Thu, 29 May 2014 11:21:24 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt wrote: > Try perlbrew to set up yor own local version of perl just for your user. > http://perlbrew.pl > > --Carson > > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 09:58:22 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 14:58:22 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, The results are stored in the datastore directory in the "maker.output" directory. You can merge those results into one gff file with the gff3_merge accessory script. It's included in the bin directory. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 3:21 AM, Panos Ioannidis > wrote: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt > wrote: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt > Cc: Daniel Ence >, ">" > Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt > wrote: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis > Cc: ">" > Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Thu May 29 14:15:39 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Thu, 29 May 2014 15:15:39 -0400 Subject: [maker-devel] maker gene order in gff output Message-ID: Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 15:37:24 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:37:24 -0400 Subject: [maker-devel] Question about 'keep_pred' setting Message-ID: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 15:44:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 20:44:28 +0000 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: Message-ID: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Hi Daniel, Your interpretation of the code is correct. keep_preds is a binary setting. There's been some discussion behind-the-scenes about making it more flexible, but that hasn't been implemented yet. We need to fix what it says in the control file. Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 15:47:47 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:47:47 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary > setting. There's been some discussion behind-the-scenes about making it > more flexible, but that hasn't been implemented yet. We need to fix what it > says in the control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > > Good afternoon! > > I have a quick question about the keep_pred setting in Maker. In older > versions of Maker, this was a binary value indicating whether unsupported > predictions should be kept. I'm now using Maker 2.31.3, where it's > described as a scaled value indicating a "concordance threshold" for > unsupported predictions. As far as I can tell from the code, however, it's > still treated in the same way as before. > > Could you briefly describe the motivation for this setting and the > intended (although possibly incomplete) change in its functionality in new > versions of Maker? > > Thanks! > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 16:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 15:43:35 -0600 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: There is a hidden score called abAED that measures concordance among the ab initio gene predictors . The idea was to have ab initio models that are the same across multiple ab initio predictor be kept if they're group concordance is high enough, then drop ab initio predictions that only happen in one ab initio predictor. Currently the option is all or nothing, the threshold would give a more fine grained control of keeping just some unsupported predictions. --Carson From: Daniel Standage Date: Thursday, May 29, 2014 at 2:47 PM To: Daniel Ence Cc: Maker Mailing List Subject: Re: [maker-devel] Question about 'keep_pred' setting Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary setting. > There's been some discussion behind-the-scenes about making it more flexible, > but that hasn't been implemented yet. We need to fix what it says in the > control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's described >> as a scaled value indicating a "concordance threshold" for unsupported >> predictions. As far as I can tell from the code, however, it's still treated >> in the same way as before. >> >> Could you briefly describe the motivation for this setting and the intended >> (although possibly incomplete) change in its functionality in new versions of >> Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 17:29:39 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 18:29:39 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Ah, that makes sense. Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 5:43 PM, Carson Holt wrote: > There is a hidden score called abAED that measures concordance among the > ab initio gene predictors . The idea was to have ab initio models that are > the same across multiple ab initio predictor be kept if they're group > concordance is high enough, then drop ab initio predictions that only > happen in one ab initio predictor. Currently the option is all or nothing, > the threshold would give a more fine grained control of keeping just some > unsupported predictions. > > --Carson > > > From: Daniel Standage > Date: Thursday, May 29, 2014 at 2:47 PM > To: Daniel Ence > Cc: Maker Mailing List > Subject: Re: [maker-devel] Question about 'keep_pred' setting > > Thanks. > > Just curious: how would the intended behavior differ if keep_pred was set > to, say, 0.5, instead of 0 or 1? > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, May 29, 2014 at 4:44 PM, Daniel Ence > wrote: > >> Hi Daniel, >> >> Your interpretation of the code is correct. keep_preds is a binary >> setting. There's been some discussion behind-the-scenes about making it >> more flexible, but that hasn't been implemented yet. We need to fix what it >> says in the control file. >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 29, 2014, at 2:37 PM, Daniel Standage >> wrote: >> >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's >> described as a scaled value indicating a "concordance threshold" for >> unsupported predictions. As far as I can tell from the code, however, it's >> still treated in the same way as before. >> >> Could you briefly describe the motivation for this setting and the >> intended (although possibly incomplete) change in its functionality in new >> versions of Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 22:11:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 21:11:11 -0600 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: The maker_map_ids script that comes with MAKER can be used to generate new names of the style PREFIX###### or PREFIX_######. You can use the --sort_order flag to sort the contigs in whatever your preferred order is before generating the new names. Then use the map_gff_ids and map_fasta_ids to change the names in the gff3 and fasta files respectively. Here is some extra information from a tutorial where the maker_map_ids script is used --> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_G MOD_Online_Training_2014#Post_Processing_of_Annotations --Carson From: Guohong Cai Date: Thursday, May 29, 2014 at 1:15 PM To: "" Subject: [maker-devel] maker gene order in gff output Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Fri May 30 06:40:17 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Fri, 30 May 2014 06:40:17 -0500 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: Great????Guohong On Thu, May 29, 2014 at 10:11 PM, Carson Holt wrote: > The maker_map_ids script that comes with MAKER can be used to generate new > names of the style PREFIX###### or PREFIX_######. You can use > the --sort_order flag to sort the contigs in whatever your preferred order > is before generating the new names. > > Then use the map_gff_ids and map_fasta_ids to change the names in the > gff3 and fasta files respectively. > > Here is some extra information from a tutorial where the maker_map_ids > script is used --> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations > > --Carson > > > From: Guohong Cai > Date: Thursday, May 29, 2014 at 1:15 PM > To: "" > Subject: [maker-devel] maker gene order in gff output > > Hi Carson, > > In the maker output, the genes have names like "genemark-scaffold17- > processed-gene-0.0". Many users probably will eventually give the genes > different names, such as GSGxxx (Genus Species Gene #). > > In the gff output, the scaffolds are not in order (either numerical order > or the order of input assembly). On the same scaffold, the genes are not > listed in order either. This will make it a little harder for users to > change the gene IDs. We may name the genes in order from scaffold 1 to > scaffold N, and and each scaffold, order the genes from left to right, e.g > GSG00001, GSG00002). Do you think you can order the genes in the gff > output? For example, order the scaffolds according to the input genome > assembly, and on each scaffold, order the genes from 5' to 3'. > > Thanks. > > Guohong > Rutgers University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Sat May 31 10:23:23 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Sat, 31 May 2014 11:23:23 -0400 Subject: [maker-devel] Precomputed alignments Message-ID: Hello again! About a year ago I asked about using precomputed alignments with Maker. The thread quickly took a different direction as we tried to track down other issues, and I never got the thread back on its original track. So, to return to the original question, what exactly is required when providing pre-computed alignments in GFF3 format? For example, does it affect Maker's behavior whether a score is given? The "Target" attribute? The "Gap" attribute? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdelmore at zoology.ubc.ca Thu May 1 09:06:27 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Thu, 1 May 2014 08:06:27 -0700 Subject: [maker-devel] problem with dsindex Message-ID: Hi Carson, I wanted to confirm that the interproscan scripts provided in maker are now compatible with version 5 of the program and ask if there was any additional documentation for the use of iprscan_wrap. It looks like that script will run interproscan for us but I'm not sure what to supply on the command line. I could also run interproscan directory but am wondering if you have any suggestions for what to include on the command line, as this has changed in the new version. This is what I would propose: ./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup Thanks, Kira From carsonhh at gmail.com Fri May 2 12:18:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:18:04 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: References: Message-ID: The scripts that use interproscan output should work with version 5 (iprscan2gff3, ipr_update_gff, etc.). But scripts that wrap interproscan and run it for you like iprscan_wrap only work with version 4. Thanks, Carson On 5/1/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Hi Carson, > >I wanted to confirm that the interproscan scripts provided in maker are >now compatible with version 5 of the program and ask if there was any >additional documentation for the use of iprscan_wrap. It looks like that >script will run interproscan for us but I'm not sure what to supply on the >command line. > >I could also run interproscan directory but am wondering if you have any >suggestions for what to include on the command line, as this has changed >in the new version. This is what I would propose: > >./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup > >Thanks, >Kira > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri May 2 12:55:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:55:27 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Whichever has the best AED score I believe, but you can add gene_id= to the header of each fasta file to ensure MAKER doesn't try and cluster unrelated transcripts into a single gene. Then the transcript name and gene name will be guaranteed to match up. --Carson From: Shaun Jackman Date: Wednesday, April 30, 2014 at 5:25 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] est_forward and conflicting names Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 13:40:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 12:40:42 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are conventionally named according to the amino acid and anticodon, such as trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names with that convention? On 6 March 2014 12:58, Carson Holt wrote: Yes. I?ll fix the naming. > > Thanks, > Carson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 13:50:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 13:50:23 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should already be fixed in the current 2.31.3 download. I'll also send you the subversion credentials in a separate e-mail. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 1:40 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? On 6 March 2014 12:58, Carson Holt wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 14:00:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 13:00:22 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also > send you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes > the naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified > genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the > names with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > > Yes. I?ll fix the naming. >> >> Thanks, >> Carson >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 14:14:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 14:14:11 -0600 Subject: [maker-devel] Mapping gene names Message-ID: I need to fix that last update tag. I did a point release, because there were a couple of very minor fixes that didn't justify a full release (tRNA naming and a fasta_merge bug for tRNAs - I think three lines total of code). There won't be another major version release for a while because we're working on MAKER-EVM which will be version 3.0 (joint project for full MAKER integration with EVM). So just point releases on 2.31 (which will be the very last version of MAKER2). I'll fix the static link and add an new one for 2.31.3. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 2:00 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31 reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also send > you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes the > naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? >> >> The integration of MAKER-P with tRNAscan is very useful. The identified genes >> are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are >> conventionally named according to the amino acid and anticodon, such as >> trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names >> with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > >> Yes. I?ll fix the naming. >> >> Thanks, >> Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From cynsb1987 at gmail.com Sun May 4 19:58:33 2014 From: cynsb1987 at gmail.com (hueytyng) Date: Mon, 5 May 2014 11:58:33 +1000 Subject: [maker-devel] Non-unique top level ID Message-ID: Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 08:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 08:18:18 -0600 Subject: [maker-devel] Non-unique top level ID In-Reply-To: References: Message-ID: If you use gff3_merge with the -l flag, then it will check for non-unique ID's and give new IDs to make them unique. Also in general it is better just to use the cufflinks results and exclude tophat results as they tend to be very noisy and decrease the quality of the final models overall. Thanks, Carson From: hueytyng Date: Sunday, May 4, 2014 at 7:58 PM To: Subject: [maker-devel] Non-unique top level ID Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 10:48:30 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 11:48:30 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue Message-ID: Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 10:53:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 10:53:09 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of MAKER (2.31.3 is current). ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ --Carson From: Dave Messina Date: Monday, May 5, 2014 at 10:48 AM To: Subject: [maker-devel] MAKER / RepeatRunner configuration issue Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 12:05:54 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 13:05:54 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old > version of MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER > File Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:32:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:32:01 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: I can't reproduce your issue, so it is probably something about your system or environment. 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. 2. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. 3. Is your current working directory full. 4. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. 5. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). 6. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:44:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:44:11 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Could you give me the full output of this command --> df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A I'm really mostly interested in the mount information. Some non-traditional network storage implementations can induce odd behaviors (for example by not supporting operations like hard links, etc.). --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 13:53:58 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 14:53:58 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A > Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Is you /tmp directory full (or whatever you have $TMPDIR > environmental variable is set to). Use 'df -h /tmp' to check. > > $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Are you running in a directory on an NFS drive? Is it true NFS or > is it something like FUSE. > > Same error on true NFS or on local disk. > > 1. Is your current working directory full. > > No. > > 1. Are you setting TMP= in the control files to either an NFS mounted > location or an in memory mounted location. Same issue if you are setting > the system's TMPDIR environmental variable to one of these. > > I tried setting it to /tmp just to be sure (no difference). > > 1. Is your default /tmp directory in fact locally mounted (some > clusters set this to in memory scratch). > > Yes. > > 1. Even though you already checked, humor me and run this exact > command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx > -version > > $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 14:00:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 14:00:57 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 16:34:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 16:34:14 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Mon May 5 18:09:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Mon, 5 May 2014 17:09:41 -0700 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Message-ID: Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Mon May 5 23:06:25 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 6 May 2014 05:06:25 +0000 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365FB90A5@mxb2.hg.genetics.utah.edu> you are the Man, Carson. --mark ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, May 05, 2014 4:34 PM To: Dave Messina Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt > Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina > Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt > wrote: df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. Same error on true NFS or on local disk. 1. Is your current working directory full. No. 1. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). 1. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). Yes. 1. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave From kdelmore at zoology.ubc.ca Mon May 5 22:36:41 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Mon, 5 May 2014 21:36:41 -0700 Subject: [maker-devel] iprscan and ipr_update_gff Message-ID: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Hi, I have a question about the interproscan scripts available with maker. I'm following the recommendations posted by Carson in Aug 2011 to incorporate results from iprscan. I'm getting quite a few warning messages with ipr_update_gff; they're all the same and suggest that there's no value for $name. When I look through the updated gff, however, the dbxrefs have been added. Is this something I should be worried about? I'm using iprscan version 5 and actually get some warning messages there as well but again, the output looks alright. In addition, some of my fastas don't get these warnings in iprscan and they still give me the error with ipr_update_gff so I don't think that's the problem. I'm using proteins from UniProt. My commands and errors are below. I've also attached the first 20000 lines from my initial gff and raw file from iprscan. Thanks, I really appreciate your continued support. Kira ### commands for interproscan scripts available in maker iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace error after last step (just an example, a ton of similar lines): Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. ### commands for interproscan 5 interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > interpro_6.out 2>&1 interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml error after first step: 04/05/2014 19:22:09:269 25% completed 04/05/2014 21:27:36:305 50% completed 04/05/2014 21:32:34:236 75% completed 04/05/2014 21:38:01:379 90% completed 2014-05-04 21:50:22,761 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959/jobPIRSF-2.84 2014-05-04 21:50:22,908 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:253] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959 04/05/2014 21:50:23:380 100% done: InterProScan analyses completed error after second step: interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml 05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... 2014-05-05 21:04:00,603 [uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run completion, unable to delete temporary directory /home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_210353293_gsjh -------------- next part -------------- A non-text attachment was scrubbed... Name: 6.maker.proteins.fasta.xml.raw Type: application/octet-stream Size: 1098374 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6_first20000.gff Type: application/octet-stream Size: 2880872 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 6 08:31:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:31:55 -0600 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 In-Reply-To: References: Message-ID: Nothing in the scoring or gene selection has changed. Changes are: Fix trnascan naming so codon is included in name Fix fgenesh parsing when used with correct_est_fusion Fix final ID bug when '/' character used in GFF3 input ID. Fix a start codon issue that could come up under when the right set of parameters were used (primarily correct_est_fusion and protein2genome). If you can provide both gff3 outputs form comparison, I could probably tell you why. Set up both runs to make sure that settings are indeed identical. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Monday, May 5, 2014 at 6:09 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 6 08:57:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:57:04 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: You have entries in your interproscan output that aren't in your GFF3. Is your GFF3 file truncated? --Carson On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" wrote: >Hi, I have a question about the interproscan scripts available with maker. > >I'm following the recommendations posted by Carson in Aug 2011 to >incorporate results from iprscan. I'm getting quite a few warning messages >with ipr_update_gff; they're all the same and suggest that there's no >value for $name. When I look through the updated gff, however, the dbxrefs >have been added. Is this something I should be worried about? > >I'm using iprscan version 5 and actually get some warning messages there >as well but again, the output looks alright. In addition, some of my >fastas don't get these warnings in iprscan and they still give me the >error with ipr_update_gff so I don't think that's the problem. I'm using >proteins from UniProt. My commands and errors are below. I've also >attached the first 20000 lines from my initial gff and raw file from >iprscan. > >Thanks, I really appreciate your continued support. >Kira > >### > >commands for interproscan scripts available in maker >iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace > >error after last step (just an example, a ton of similar lines): >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. > > >### > >commands for interproscan 5 >interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >interpro_6.out 2>&1 >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml > >error after first step: >04/05/2014 19:22:09:269 25% completed >04/05/2014 21:27:36:305 50% completed >04/05/2014 21:32:34:236 75% completed >04/05/2014 21:38:01:379 90% completed >2014-05-04 21:50:22,761 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >248] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959/jobPIRSF-2.84 >2014-05-04 21:50:22,908 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >253] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959 >04/05/2014 21:50:23:380 100% done: InterProScan analyses completed > >error after second step: >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >2014-05-05 21:04:00,603 >[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >completion, unable to delete temporary directory >/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >10353293_gsjh_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From kdelmore at zoology.ubc.ca Tue May 6 09:06:56 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:06:56 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Thanks for your reply. I have not truncated the gff3. I'm using files from the datastore that were written at the same time so I'm not sure how that would happen. I split my multifasta before running it through maker and have not merged the gff or protein.fasta for iprscan. That wouldn't be the problem would it? > You have entries in your interproscan output that aren't in your GFF3. Is > your GFF3 file truncated? > > --Carson > > > On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" > wrote: > >>Hi, I have a question about the interproscan scripts available with >> maker. >> >>I'm following the recommendations posted by Carson in Aug 2011 to >>incorporate results from iprscan. I'm getting quite a few warning >> messages >>with ipr_update_gff; they're all the same and suggest that there's no >>value for $name. When I look through the updated gff, however, the >> dbxrefs >>have been added. Is this something I should be worried about? >> >>I'm using iprscan version 5 and actually get some warning messages there >>as well but again, the output looks alright. In addition, some of my >>fastas don't get these warnings in iprscan and they still give me the >>error with ipr_update_gff so I don't think that's the problem. I'm using >>proteins from UniProt. My commands and errors are below. I've also >>attached the first 20000 lines from my initial gff and raw file from >>iprscan. >> >>Thanks, I really appreciate your continued support. >>Kira >> >>### >> >>commands for interproscan scripts available in maker >>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >> -inplace >> >>error after last step (just an example, a ton of similar lines): >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. >> >> >>### >> >>commands for interproscan 5 >>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>interpro_6.out 2>&1 >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >> >>error after first step: >>04/05/2014 19:22:09:269 25% completed >>04/05/2014 21:27:36:305 50% completed >>04/05/2014 21:32:34:236 75% completed >>04/05/2014 21:38:01:379 90% completed >>2014-05-04 21:50:22,761 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>248] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959/jobPIRSF-2.84 >>2014-05-04 21:50:22,908 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>253] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959 >>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >> >>error after second step: >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>2014-05-05 21:04:00,603 >>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>completion, unable to delete temporary directory >>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >>10353293_gsjh_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue May 6 09:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:09:13 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: The file you sent was missing the ##FASTA entry and all sequence at the bottom for example. Is that the way it is in the datastore? --Carson On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Thanks for your reply. I have not truncated the gff3. I'm using files from >the datastore that were written at the same time so I'm not sure how that >would happen. I split my multifasta before running it through maker and >have not merged the gff or protein.fasta for iprscan. That wouldn't be the >problem would it? > >> You have entries in your interproscan output that aren't in your GFF3. >>Is >> your GFF3 file truncated? >> >> --Carson >> >> >> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Hi, I have a question about the interproscan scripts available with >>> maker. >>> >>>I'm following the recommendations posted by Carson in Aug 2011 to >>>incorporate results from iprscan. I'm getting quite a few warning >>> messages >>>with ipr_update_gff; they're all the same and suggest that there's no >>>value for $name. When I look through the updated gff, however, the >>> dbxrefs >>>have been added. Is this something I should be worried about? >>> >>>I'm using iprscan version 5 and actually get some warning messages there >>>as well but again, the output looks alright. In addition, some of my >>>fastas don't get these warnings in iprscan and they still give me the >>>error with ipr_update_gff so I don't think that's the problem. I'm using >>>proteins from UniProt. My commands and errors are below. I've also >>>attached the first 20000 lines from my initial gff and raw file from >>>iprscan. >>> >>>Thanks, I really appreciate your continued support. >>>Kira >>> >>>### >>> >>>commands for interproscan scripts available in maker >>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>> -inplace >>> >>>error after last step (just an example, a ton of similar lines): >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15242. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15353. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15674. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15776. >>> >>> >>>### >>> >>>commands for interproscan 5 >>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>>interpro_6.out 2>&1 >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>> >>>error after first step: >>>04/05/2014 19:22:09:269 25% completed >>>04/05/2014 21:27:36:305 50% completed >>>04/05/2014 21:32:34:236 75% completed >>>04/05/2014 21:38:01:379 90% completed >>>2014-05-04 21:50:22,761 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>248] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959/jobPIRSF-2.84 >>>2014-05-04 21:50:22,908 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>253] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959 >>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>> >>>error after second step: >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>2014-05-05 21:04:00,603 >>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>completion, unable to delete temporary directory >>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>_2 >>>10353293_gsjh_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > From kdelmore at zoology.ubc.ca Tue May 6 09:26:07 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:26:07 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> I just printed the first 20000 lines of the gff to send to you because it was too large to send through email. I've included a dropbox link to the full file below. I've also included a link to the final gff with dbx refs; as I mentioned, it does seem to add them even with the error. If I run ipr_update_gff twice, I get the warnings on the first run but not on the second. Does that help diagnose the problem? The only other red flag I've encountered with maker was in including external gff3 from geneid and sgp2. These gff3s failed validation at the website suggested the the README file, with the warning message "cds: non-unique id" for all cds, but maker didn't give me a warning and they seem to be incorporated into the annotation fine. original gff https://www.dropbox.com/s/nimoh605jdk9myx/6.gff final gff https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta Thanks again for getting back to me. > The file you sent was missing the ##FASTA entry and all sequence at the > bottom for example. Is that the way it is in the datastore? > > --Carson > > > On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" > wrote: > >>Thanks for your reply. I have not truncated the gff3. I'm using files >> from >>the datastore that were written at the same time so I'm not sure how that >>would happen. I split my multifasta before running it through maker and >>have not merged the gff or protein.fasta for iprscan. That wouldn't be >> the >>problem would it? >> >>> You have entries in your interproscan output that aren't in your GFF3. >>>Is >>> your GFF3 file truncated? >>> >>> --Carson >>> >>> >>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>> >>> wrote: >>> >>>>Hi, I have a question about the interproscan scripts available with >>>> maker. >>>> >>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>incorporate results from iprscan. I'm getting quite a few warning >>>> messages >>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>value for $name. When I look through the updated gff, however, the >>>> dbxrefs >>>>have been added. Is this something I should be worried about? >>>> >>>>I'm using iprscan version 5 and actually get some warning messages >>>> there >>>>as well but again, the output looks alright. In addition, some of my >>>>fastas don't get these warnings in iprscan and they still give me the >>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>> using >>>>proteins from UniProt. My commands and errors are below. I've also >>>>attached the first 20000 lines from my initial gff and raw file from >>>>iprscan. >>>> >>>>Thanks, I really appreciate your continued support. >>>>Kira >>>> >>>>### >>>> >>>>commands for interproscan scripts available in maker >>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>> -inplace >>>> >>>>error after last step (just an example, a ton of similar lines): >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15242. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15353. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15674. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15776. >>>> >>>> >>>>### >>>> >>>>commands for interproscan 5 >>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>> > >>>>interpro_6.out 2>&1 >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>> >>>>error after first step: >>>>04/05/2014 19:22:09:269 25% completed >>>>04/05/2014 21:27:36:305 50% completed >>>>04/05/2014 21:32:34:236 75% completed >>>>04/05/2014 21:38:01:379 90% completed >>>>2014-05-04 21:50:22,761 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>248] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959/jobPIRSF-2.84 >>>>2014-05-04 21:50:22,908 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>253] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959 >>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>> >>>>error after second step: >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>2014-05-05 21:04:00,603 >>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>completion, unable to delete temporary directory >>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>>_2 >>>>10353293_gsjh_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > > From carsonhh at gmail.com Tue May 6 09:47:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:47:23 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Ok. With the full file I can see what what was causing the message. It is a parsing bug that was happening in a few cases, and I've now fixed it. But you can ignore it, because it has no effect on the output. It would only be an issue if the ID= and Name= tags were different in the GFF3 for the gene feature lines (which is never be true for MAKER's output). It was correctly parsing the 'mRNA' Name and ID tags, but was sometimes having issue with the Name= tags for the 'gene' lines (but because they are redundant with ID= tag, the script still finds what it needs to add the Dbxref= tags). --Carson On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" wrote: >I just printed the first 20000 lines of the gff to send to you because it >was too large to send through email. I've included a dropbox link to the >full file below. I've also included a link to the final gff with dbx refs; >as I mentioned, it does seem to add them even with the error. If I run >ipr_update_gff twice, I get the warnings on the first run but not on the >second. Does that help diagnose the problem? > >The only other red flag I've encountered with maker was in including >external gff3 from geneid and sgp2. These gff3s failed validation at the >website suggested the the README file, with the warning message "cds: >non-unique id" for all cds, but maker didn't give me a warning and they >seem to be incorporated into the annotation fine. > >original gff >https://www.dropbox.com/s/nimoh605jdk9myx/6.gff > >final gff >https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta > >Thanks again for getting back to me. > >> The file you sent was missing the ##FASTA entry and all sequence at the >> bottom for example. Is that the way it is in the datastore? >> >> --Carson >> >> >> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Thanks for your reply. I have not truncated the gff3. I'm using files >>> from >>>the datastore that were written at the same time so I'm not sure how >>>that >>>would happen. I split my multifasta before running it through maker and >>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>> the >>>problem would it? >>> >>>> You have entries in your interproscan output that aren't in your GFF3. >>>>Is >>>> your GFF3 file truncated? >>>> >>>> --Carson >>>> >>>> >>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>> >>>> wrote: >>>> >>>>>Hi, I have a question about the interproscan scripts available with >>>>> maker. >>>>> >>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>> messages >>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>value for $name. When I look through the updated gff, however, the >>>>> dbxrefs >>>>>have been added. Is this something I should be worried about? >>>>> >>>>>I'm using iprscan version 5 and actually get some warning messages >>>>> there >>>>>as well but again, the output looks alright. In addition, some of my >>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>> using >>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>iprscan. >>>>> >>>>>Thanks, I really appreciate your continued support. >>>>>Kira >>>>> >>>>>### >>>>> >>>>>commands for interproscan scripts available in maker >>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>> -inplace >>>>> >>>>>error after last step (just an example, a ton of similar lines): >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15242. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15353. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15674. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15776. >>>>> >>>>> >>>>>### >>>>> >>>>>commands for interproscan 5 >>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>>> > >>>>>interpro_6.out 2>&1 >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>> >>>>>error after first step: >>>>>04/05/2014 19:22:09:269 25% completed >>>>>04/05/2014 21:27:36:305 50% completed >>>>>04/05/2014 21:32:34:236 75% completed >>>>>04/05/2014 21:38:01:379 90% completed >>>>>2014-05-04 21:50:22,761 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>248] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959/jobPIRSF-2.84 >>>>>2014-05-04 21:50:22,908 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>253] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959 >>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>> >>>>>error after second step: >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>2014-05-05 21:04:00,603 >>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>completion, unable to delete temporary directory >>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_201405 >>>>>05 >>>>>_2 >>>>>10353293_gsjh_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>>> >>> >>> >> >> >> > > From carsonhh at gmail.com Tue May 6 09:54:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:54:41 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Actually looking a little closer, it wouldn't even matter if the ID= and Name= tags were different for the 'gene', because interproscan gives the results for the transcripts (mRNA) and not the gene. So Dbxref still gets populated correctly reguardless. --Carson On 5/6/14, 9:47 AM, "Carson Holt" wrote: >Ok. With the full file I can see what what was causing the message. It is >a parsing bug that was happening in a few cases, and I've now fixed it. >But you can ignore it, because it has no effect on the output. > >It would only be an issue if the ID= and Name= tags were different in the >GFF3 for the gene feature lines (which is never be true for MAKER's >output). It was correctly parsing the 'mRNA' Name and ID tags, but was >sometimes having issue with the Name= tags for the 'gene' lines (but >because they are redundant with ID= tag, the script still finds what it >needs to add the Dbxref= tags). > >--Carson > > >On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" >wrote: > >>I just printed the first 20000 lines of the gff to send to you because it >>was too large to send through email. I've included a dropbox link to the >>full file below. I've also included a link to the final gff with dbx >>refs; >>as I mentioned, it does seem to add them even with the error. If I run >>ipr_update_gff twice, I get the warnings on the first run but not on the >>second. Does that help diagnose the problem? >> >>The only other red flag I've encountered with maker was in including >>external gff3 from geneid and sgp2. These gff3s failed validation at the >>website suggested the the README file, with the warning message "cds: >>non-unique id" for all cds, but maker didn't give me a warning and they >>seem to be incorporated into the annotation fine. >> >>original gff >>https://www.dropbox.com/s/nimoh605jdk9myx/6.gff >> >>final gff >>https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta >> >>Thanks again for getting back to me. >> >>> The file you sent was missing the ##FASTA entry and all sequence at the >>> bottom for example. Is that the way it is in the datastore? >>> >>> --Carson >>> >>> >>> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >>> wrote: >>> >>>>Thanks for your reply. I have not truncated the gff3. I'm using files >>>> from >>>>the datastore that were written at the same time so I'm not sure how >>>>that >>>>would happen. I split my multifasta before running it through maker and >>>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>>> the >>>>problem would it? >>>> >>>>> You have entries in your interproscan output that aren't in your >>>>>GFF3. >>>>>Is >>>>> your GFF3 file truncated? >>>>> >>>>> --Carson >>>>> >>>>> >>>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>>> >>>>> wrote: >>>>> >>>>>>Hi, I have a question about the interproscan scripts available with >>>>>> maker. >>>>>> >>>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>>> messages >>>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>>value for $name. When I look through the updated gff, however, the >>>>>> dbxrefs >>>>>>have been added. Is this something I should be worried about? >>>>>> >>>>>>I'm using iprscan version 5 and actually get some warning messages >>>>>> there >>>>>>as well but again, the output looks alright. In addition, some of my >>>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>>> using >>>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>>iprscan. >>>>>> >>>>>>Thanks, I really appreciate your continued support. >>>>>>Kira >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan scripts available in maker >>>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>>> -inplace >>>>>> >>>>>>error after last step (just an example, a ton of similar lines): >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15242. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15353. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15674. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15776. >>>>>> >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan 5 >>>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup >>>>>>\ >>>>>> > >>>>>>interpro_6.out 2>&1 >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>> >>>>>>error after first step: >>>>>>04/05/2014 19:22:09:269 25% completed >>>>>>04/05/2014 21:27:36:305 50% completed >>>>>>04/05/2014 21:32:34:236 75% completed >>>>>>04/05/2014 21:38:01:379 90% completed >>>>>>2014-05-04 21:50:22,761 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>248] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959/jobPIRSF-2.84 >>>>>>2014-05-04 21:50:22,908 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>253] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959 >>>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>>> >>>>>>error after second step: >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>>2014-05-05 21:04:00,603 >>>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>>completion, unable to delete temporary directory >>>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140 >>>>>>5 >>>>>>05 >>>>>>_2 >>>>>>10353293_gsjh_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > From sjackman at gmail.com Thu May 8 16:26:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:26:34 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun *http://sjackman.ca * On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to > the header of each fasta file to ensure MAKER doesn't try and cluster > unrelated transcripts into a single gene. Then the transcript name and > gene name will be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of > these reference sequences have sensible and consistent gene names, and so > I?m using est_forward to retain the gene names. This workflow is working > well for me. Some of the genes pulled in from GenBank have less useful > names like orf1234 or other numeric IDs. When multiple evidence sequences > map to the same location, how does est_forward choose which name to use? > If it?s chosen arbitrarily, could it be possible to choose the most common > name instead? > > Thanks, > Shaun > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:33:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:33:36 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: When moving transcripts onto a new assembly, you may have multiple transcripts of the same gene. Because your transcript name should be your fasta ID there is no way for MAKER to know that they go together when moving the models forward, so you can use the gene= option to make MAKER aware that these belong to the same genes. They will be grouped and you recover all splice forms as a group. Example: >SMEDT_00004 gene=dpp AAAAAAA >SMEDT_00005 gene=dpp AAAAAAA --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:26 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun http://sjackman.ca On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to the > header of each fasta file to ensure MAKER doesn't try and cluster unrelated > transcripts into a single gene. Then the transcript name and gene name will > be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of these > reference sequences have sensible and consistent gene names, and so I?m using > est_forward to retain the gene names. This workflow is working well for me. > Some of the genes pulled in from GenBank have less useful names like orf1234 > or other numeric IDs. When multiple evidence sequences map to the same > location, how does est_forward choose which name to use? If it?s chosen > arbitrarily, could it be possible to choose the most common name instead? > > Thanks, > Shaun > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu May 8 16:41:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:41:41 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun *http://sjackman.ca * On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple > transcripts of the same gene. Because your transcript name should be your > fasta ID there is no way for MAKER to know that they go together when > moving the models forward, so you can use the gene= option to make MAKER > aware that these belong to the same genes. They will be grouped and you > recover all splice forms as a group. > > Example: > > >SMEDT_00004 gene=dpp > AAAAAAA > > >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the > header of the FASTA file? I?m not clear on what you mean by this. In the > FASTA header, what portion is the transcript name, and what portion is the > gene name? > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 2 May 2014 11:55, Carson Holt wrote: > >> Whichever has the best AED score I believe, but you can add gene_id= to >> the header of each fasta file to ensure MAKER doesn't try and cluster >> unrelated transcripts into a single gene. Then the transcript name and >> gene name will be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which >> I?m using with est and protein to annotate a plant mitochondrion. Most >> of these reference sequences have sensible and consistent gene names, and >> so I?m using est_forward to retain the gene names. This workflow is >> working well for me. Some of the genes pulled in from GenBank have less >> useful names like orf1234 or other numeric IDs. When multiple evidence >> sequences map to the same location, how does est_forward choose which >> name to use? If it?s chosen arbitrarily, could it be possible to choose the >> most common name instead? >> >> Thanks, >> Shaun >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:43:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:43:40 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Only if you were to remove the brackets around gene=. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:41 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun http://sjackman.ca On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple transcripts > of the same gene. Because your transcript name should be your fasta ID there > is no way for MAKER to know that they go together when moving the models > forward, so you can use the gene= option to make MAKER aware that these belong > to the same genes. They will be grouped and you recover all splice forms as a > group. > > Example: > >> >SMEDT_00004 gene=dpp > AAAAAAA > >> >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the header of > the FASTA file? I?m not clear on what you mean by this. In the FASTA header, > what portion is the transcript name, and what portion is the gene name? > > Cheers, > Shaun > > > http://sjackman.ca > > > On 2 May 2014 11:55, Carson Holt wrote: >> Whichever has the best AED score I believe, but you can add gene_id= to the >> header of each fasta file to ensure MAKER doesn't try and cluster unrelated >> transcripts into a single gene. Then the transcript name and gene name will >> be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m >> using with est and protein to annotate a plant mitochondrion. Most of these >> reference sequences have sensible and consistent gene names, and so I?m using >> est_forward to retain the gene names. This workflow is working well for me. >> Some of the genes pulled in from GenBank have less useful names like orf1234 >> or other numeric IDs. When multiple evidence sequences map to the same >> location, how does est_forward choose which name to use? If it?s chosen >> arbitrarily, could it be possible to choose the most common name instead? >> >> Thanks, >> Shaun >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:07:52 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:07:52 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. Cheers, Shaun On 4 March 2014 18:33, Carson Holt wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is > extremely messy (non-trivial problem in most organisms with high false > positive rate), so MAKER for the most part doesn?t even try to do that. It > focuses only on the coding genes. You can now use tRNAscan and snoscan in > the newest version for some non-coding RNA support (those features were > only added a couple of months ago). So just like other prediction tools > (snap, augustus etc.), the primary focus has always been the coding genes. > We?ve only started adding non-coding RNA support recently for iPlant, so > it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks > for the tip. > > The rRNA genes that are found with est2genome have the feature type set to > *mRNA* and have corresponding *five_prime_UTR*, *CDS* and > *three_prime_UTR* features. Ideally the feature type would be set to > *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. > Is that a feature that you would be interested in adding to MAKER? The rRNA > gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is > standard, so determining the appropriate type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > On 27 February 2014 17:13, Carson Holt wrote: > >> Set single_exon=1, and the minimum size to a smaller value. I think it's >> set to 250 right now. Also est2genome is looking for ORF, so if there is >> none (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >> these hits? >> >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> >> Cheers, >> Shaun >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >> >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> Cheers, >>> Shaun >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>> wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before >>> passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>> What you can do is run it once with just est_forward=1 and >>> est2genome/protein2genome set to 1. Then take those results, pass them in >>> as model_gff and use the map_forward option to then filter the results >>> based on mRNA score and that would copy names onto new gene under the >>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>> separate tool that will map genes onto new assemblies (but under the hood >>> the tool will just be calling MAKER with certain parameters restricted). I >>> do this because if people commonly use it mixed with things like SNAP I can >>> start to get some very weird behaviors. >>> >>> Thanks, >>> Carson >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 3:04 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> It seems that this could be a very useful option in those cases where >>> you have firm a priori knowledge of the placement of ESTs. However, while >>> trying it I note that est_forward implies that the est2genome predictor is >>> turned on, implicitly. Is this necessary for this to work? I?m after the >>> behavior you describe below where exonerate is made to try really hard >>> within a limited region to align an est, but I would not like maker to >>> produce est2genome predictions. >>> >>> In general, I think this maker_coor and est_forward is a feature set >>> that is worthy to be promoted into a documented feature. >>> >>> THanks, >>> Mikael >>> >>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>> >>> It will still work without est_forward. It just works a little >>> differently. Keep in mind this was a hidden feature I used to find >>> stubborn or hard to find missing genes after reassembly of a genome. >>> >>> If est_forward is provided, MAKER will parse the database to look for >>> the maker_coor tags early in the pipeline. Then it will create a list of >>> locations to search, and it will search them even if there are no BLAST >>> results to seed the search (normally MAKER gets a BLAST result first and >>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>> look for a match using all of chr1 as the input to exonerate even when >>> BLAST finds nothing (this is a very very slow search, but can help pick up >>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>> gives exonerate looser matching parameters (i.e. allows for single base >>> pair introns perhaps caused by assembly errors). The logic here is that >>> given the fact that I already told MAKER that with some degree of >>> confidence I expect sequence A to map to to location X, it will try its >>> hardest to make it match. >>> >>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>> at line 1563, but only after a BLAST alignment has already seeded it to the >>> region (that BLAST result has the information in its description >>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>> In addition any BLAST seeds that overlap maker_coor will get the search >>> space for alignment polishing adjusted to match maker_coor exactly. Also >>> match parameters for exonerate will not be relaxed as they were with >>> est_forward. >>> >>> As you can see the behavior, is slightly different (because it?s an >>> accidental feature). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 6:37 AM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> That might be a useful and time saving accidental feature. But, reading >>> the code, it seems that I need to supply maker_coor but not gene_id, as >>> well as the configuration option est_forward for this to work. Any >>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>> right? >>> >>> Mikael >>> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>> mikael.durling at slu.se> wrote: >>> >>> Can this use of maker_coor be used only to hint about the placement of >>> the ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>> There is a way. It?s not a standard option and it?s undocumented, but >>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>> that. The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add >>> tags to your fasta headers, those can be used to guide the mapping and >>> naming. For example, gene_id= will ensure different isoforms >>> that share a common gene_id get clustered into the same gene, >>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using >>> blast alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, >>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>> well. Is it possible to map the names of the genes from the related species >>> to my annotation? I see the *map_forward* option, which applies to the >>> *model_gff* parameter. Is there a similar option for *est* and *protein* >>> ? >>> >>> *maker_opts.ctl* >>> >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 15:18:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 15:18:52 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Thanks. Looks interesting. Also since output is already GFF3, you could probably just use it with gff passthrough. It doesn't appear to support eukaryotes though. --Carson Sent from my iPhone > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnap to predict rRNA. > > Cheers, > Shaun > > >> On 4 March 2014 18:33, Carson Holt wrote: >> Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>>> >>>>>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>>>>> >>>>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>>>>> >>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>>>>> >>>>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Mikael Brandstr?m Durling >>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>>> To: Carson Holt >>>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>>>>> >>>>>>>> Mikael >>>>>>>> >>>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>>>> >>>>>>>>> Yes. That should work as well as an accidental feature. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>>>>>> >>>>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mikael >>>>>>>>>> >>>>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>>>>> >>>>>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>>>>> >>>>>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>>>>> >>>>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Shaun Jackman >>>>>>>>>>> Reply-To: Shaun Jackman >>>>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>>>>> To: >>>>>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>>>>> >>>>>>>>>>> maker_opts.ctl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> est=NC_123456.frn >>>>>>>>>>> protein=NC_123456.faa >>>>>>>>>>> est2genome=1 >>>>>>>>>>> protein2genome=1 >>>>>>>>>>> Thanks, >>>>>>>>>>> Shaun >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:25:21 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:25:21 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson, Torsten. It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). barrnap --help ? --kingdom [X] Kingdom: [b]acteria [a]rchaea (default 'bacteria') Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:06:31 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:06:31 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 18:19:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 18:19:43 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should be fixed in the current download? It came up on the mailing list a couple of weeks ago. I'll check. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Wednesday, May 14, 2014 at 6:06 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun http://sjackman.ca On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > >> Hi, Carson. Perhaps MAKER could integrate Barrnap >> to predict rRNA. >> >> Cheers, >> Shaun >> >> >> On 4 March 2014 18:33, Carson Holt wrote: >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were only >>> added a couple of months ago). So just like other prediction tools (snap, >>> augustus etc.), the primary focus has always been the coding genes. We?ve >>> only started adding non-coding RNA support recently for iPlant, so it?s >>> still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >>> the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set to >>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR >>> features. Ideally the feature type would be set to rRNA or tRNA as >>> appropriate, and would omit the UTR and CDS features. Is that a feature that >>> you would be interested in adding to MAKER? The rRNA gene names all start >>> with ?rrn? and the tRNA gene names with ?trn?, as is standard, so >>> determining the appropriate type should be straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>>> none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>>> Sorry, ignore my previous question. est_forward also carries forward the >>>>> names of protein evidence and works like a charm. Thank you! >>>>> >>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>>> these hits? >>>>> organism_type=prokaryotic >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> est_forward=1 >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>>> names from protein2genome? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Shaun >>>>>> >>>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>>> ) wrote: >>>>>> >>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>>> passing the gff3 to model_gff. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:22:37 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:22:37 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: I'm using MAKER 2.31.4. *http://sjackman.ca * On 14 May 2014 17:19, Carson Holt wrote: > That should be fixed in the current download? It came up on the mailing > list a couple of weeks ago. I'll check. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Wednesday, May 14, 2014 at 6:06 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I used other_gff to pass the following four-line GFF file of > Barrnap rRNA annotations through. The output of gff3_merge is quite > bizarre. See below. > > Input: > > ##gff-version 3 > 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA > > Output: > > ### > ARRAY(0x7feceb928780) > ### > ARRAY(0x7feceaa548a0) > ### > ARRAY(0x7feceeb01c60) > ### > ARRAY(0x7fecedf6fef8) > ### > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 14 May 2014 14:18, Carson Holt wrote: > >> Thanks. Looks interesting. Also since output is already GFF3, you could >> probably just use it with gff passthrough. It doesn't appear to support >> eukaryotes though. >> >> --Carson >> >> >> Sent from my iPhone >> >> On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: >> >> Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. >> >> Cheers, >> Shaun >> >> On 4 March 2014 18:33, Carson Holt wrote: >> >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were >>> only added a couple of months ago). So just like other prediction tools >>> (snap, augustus etc.), the primary focus has always been the coding genes. >>> We?ve only started adding non-coding RNA support recently for iPlant, so >>> it?s still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >>> for the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set >>> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >>> *three_prime_UTR* features. Ideally the feature type would be set to >>> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >>> features. Is that a feature that you would be interested in adding to >>> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >>> with ?trn?, as is standard, so determining the appropriate type should be >>> straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> >>>> Set single_exon=1, and the minimum size to a smaller value. I think >>>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>>> is none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward >>>> the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> >>>>> Is there a corresponding protein_forward=1 option to map forward >>>>> protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>>> wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). I >>>>> do this because if people commonly use it mixed with things like SNAP I can >>>>> start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where >>>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set >>>>> that is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little >>>>> differently. Keep in mind this was a hidden feature I used to find >>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for >>>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>>> locations to search, and it will search them even if there are no BLAST >>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>> look for a match using all of chr1 as the input to exonerate even when >>>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>>> gives exonerate looser matching parameters (i.e. allows for single base >>>>> pair introns perhaps caused by assembly errors). The logic here is that >>>>> given the fact that I already told MAKER that with some degree of >>>>> confidence I expect sequence A to map to to location X, it will try its >>>>> hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>>> region (that BLAST result has the information in its description >>>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>>> match parameters for exonerate will not be relaxed as they were with >>>>> est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an >>>>> accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, >>>>> reading the code, it seems that I need to supply maker_coor but not >>>>> gene_id, as well as the configuration option est_forward for this to work. >>>>> Any occurrences of maker_coor in GI.pm seems to be conditioned on >>>>> set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>> >>>>> Yes. That should work as well as an accidental feature. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>>> mikael.durling at slu.se> wrote: >>>>> >>>>> Can this use of maker_coor be used only to hint about the placement of >>>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>>> database of EST where I have a priori knowledge of their rough placement, >>>>> can this placement be given to maker without providing est_forward=1? >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>>> that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add >>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>> naming. For example, gene_id= will ensure different isoforms >>>>> that share a common gene_id get clustered into the same gene, >>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using >>>>> blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>>> well. Is it possible to map the names of the genes from the related species >>>>> to my annotation? I see the *map_forward* option, which applies to >>>>> the *model_gff* parameter. Is there a similar option for *est* and >>>>> *protein*? >>>>> >>>>> *maker_opts.ctl* >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> >>>>> Thanks, >>>>> Shaun >>>>> _______________________________________________ maker-devel mailing >>>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Wed May 14 17:33:55 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Thu, 15 May 2014 09:33:55 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Carson & Shaun > It doesn?t appear to support eukaryotes though. > > Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap > --help output seems to be out of date. > > Barrnap predicts the location of ribosomal RNA genes in genomes. It > supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria > (12S,16S) and eukaryotes (5S,5.8S,28S,18S). > > It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed May 14 21:23:03 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 May 2014 03:23:03 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> \o/ (now I can get rid of rnammer forever!) chris On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: Carson & Shaun It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- --Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA --Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajeet at gmail.com Thu May 15 11:36:00 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Thu, 15 May 2014 10:36:00 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: My brief test of barrnap suggests that it does not perform well on rRNA genes with introns such as those found in fungal mitochondria. Setting a lower threshold for --reject and --evalue helps, but is not enough. Looks like I cannot abandon rnammer for now. FYI - if you want to test barrnap with fungal mitochondria, use --kingdom bacteria because they have 23S and 16S unlike the human mitochondria. Sajeet On Wed, May 14, 2014 at 8:23 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > \o/ > > (now I can get rid of rnammer forever!) > > chris > > On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: > > Carson & Shaun > >> It doesn?t appear to support eukaryotes though. >> >> Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap >> --help output seems to be out of date. >> >> Barrnap predicts the location of ribosomal RNA genes in genomes. It >> supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria >> (12S,16S) and eukaryotes (5S,5.8S,28S,18S). >> >> It does support eukaryota and mitochondria, I just forgot to push the > documentation changes. This has been resolved now in the 0.4.2 release. > > --kingdom [X] Kingdom: euk arc bac mito (default 'bac') > > Next release 0.5 will have an 'accurate' mode which will fine tune the > predictions using cmalign glocal alignment. > > Thanks for your interest! > > -- > > *--Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash > University, AUSTRALIA* > > *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA > --http://www.bioinformatics.net.au/ * > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Thu May 15 13:00:47 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Thu, 15 May 2014 19:00:47 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: References: Message-ID: <1400180446764.46375@uga.edu> Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani ________________________________ From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt > Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Thu May 15 16:42:53 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Fri, 16 May 2014 08:42:53 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: Sajeet, Brief test of barrnap suggests that it does not perform well on rRNA genes > with introns such as those found in fungal mitochondria. Setting a lower > threshold for --reject and --evalue helps, but is not enough. > Looks like I cannot abandon rnammer for now. > FYI - if you want to test barrnap with fungal mitochondria, use --kingdom > bacteria because they have 23S and 16S unlike the human mitochondria. > This is good feedback. Paul Gardner also mentioned the intron issue. A "fungi" kingdom is clearly needed. I am not a mycologist so any assistance is coming up with a detailed rRNA architecture for eukaryotict phyla etc is something I have started but need assistance with. Adjustment of nhmmer alignment parameters could be done to improve the intronic rRNAs too. Here is what I have so far in terms of models: https://github.com/Victorian-Bioinformatics-Consortium/barrnap/blob/master/README.md#data-sources-for-hmm-models - do i need to split euk into protist / plant / animal / fungi? - should the current 'mito' be places inside the current 'euk' ? as mito data is likely to end up in assemblies, but keep separate for mito-only data? - plastids, chloroplasts, apicoplasts; i am not sure of the subtleties of these organelles' rRNA but am willing to learn. Thank you again for testing. Any help appreciated, -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 11:16:27 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 10:16:27 -0700 Subject: [maker-devel] Specify multiple files to rmlib Message-ID: Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:33:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:33:15 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: It could be done. I've made some changes to the subversion repository if you want to test it. You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:42:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:42:50 -0600 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1400180446764.46375@uga.edu> References: <1400180446764.46375@uga.edu> Message-ID: Upgrade to 2.31.5. Changes since 2.31.3 *a protein2genome issue that was introduced in 2.31.3 was fixed *fasta_merge failing with trnascan results issue was fixed *other_gff input resulting in ARRAY reference being printed was fixed. *naming of tRNA genes was improved to include amino acid identity --Carson From: Sivaranjani Namasivayam Date: Thursday, May 15, 2014 at 1:00 PM To: Carson Holt , Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] FW: protein2genome gene models Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 14:45:59 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 13:45:59 -0700 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? --? http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: It could be done. ?I've made some changes to the subversion repository if you want to test it. ?You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. ?It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 15:02:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 15:02:59 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: No. There are some implementation issues related to how repeats are processed and collapsed that may cause hidden bugs with the comma separated list, so it needs some more testing. The label is added to the output GFF3. For example protein=uniprot.fasta:uniprot, would cause the gff3 label to be protein2genome:uniprot rather than just protein2genome. Programs like GBrowse know how to use the labels to generate on/off check boxes to turn just some of your protein results on/off in a viewer rather than all of them. --Carson From: Shaun Jackman Date: Friday, May 16, 2014 at 2:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Specify multiple files to rmlib Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? -- http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: > It could be done. I've made some changes to the subversion repository if you > want to test it. You should also be able to use labels just as you can with > other comma separated lists in MAKER using ':' to separate the label. > > Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label > > I've also found the other_gff issue. It was fixed in the subversion > repository but not in the release package I made the other day, so I've > updated the release to 2.31.5. > > --Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 16, 2014 at 11:16 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Specify multiple files to rmlib > > Hi, Carson. Some options of maker accept multiple files as a comma separated > list, but rmlib does not. Could it? > > Thanks! > Shaun > > P.S. Any update on the fix to other_gff? > > http://sjackman.ca > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue May 20 13:17:14 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 May 2014 19:17:14 +0000 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: <520E7E32-B4E2-486F-B730-F15683679440@illinois.edu> I found a problem with some tRNAscan output using MAKER 2.31.5. I had a full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs for. This was then run as follows, with the requisite error: -system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 Nested quantifiers in regex; marked by <-- HERE in m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at /home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, <$IN> line 3067590. The problematic lines: ---------------------------------------------- -system-specific-4.1$ grep "???" Zalbi.all.gff3 KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0 KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1 KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0 KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1 ---------------------------------------------- I managed to get it going by using the following modifications (regex quotemeta) in map_gff_ids (lines 107-112): for my $id (@map_ids) { # Only if the value (or the portion preceding # the first colon) is equal to the map key. next unless ($value eq $id || $value =~ /^\Q$id\E:/); $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ /\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); } I?m guessing there may be a similar problem with map_fasta_ids? chris From carsonhh at gmail.com Tue May 20 13:43:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 13:43:48 -0600 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: Thanks. trnascan support is new enough that there are these kinds of issues that we need to find and fix. MAKER tries to use the codon name supplied by trnascan, and it looks like the codon is 'Undet_???'. I don't know why that is. We currently don't do any filtering of trnascan results (i.e. we keep everything). This might be something that we really just want to be filtering out since it doesn't have a determinable codon? At the very least I should change the codon to NNN instead of ??? to correspond to the standard ambiguity nucleotides used in FASTA format. --Carson On 5/20/14, 1:17 PM, "Fields, Christopher J" wrote: >I found a problem with some tRNAscan output using MAKER 2.31.5. I had a >full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs >for. This was then run as follows, with the requisite error: > >-system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 >Nested quantifiers in regex; marked by <-- HERE in >m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at >/home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, ><$IN> line 3067590. > >The problematic lines: > >---------------------------------------------- >-system-specific-4.1$ grep "???" Zalbi.all.gff3 >KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene >-79.0 >KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet >_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 >KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1 >KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene >-72.0 >KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet >_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 >KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1 >---------------------------------------------- > >I managed to get it going by using the following modifications (regex >quotemeta) in map_gff_ids (lines 107-112): > > for my $id (@map_ids) { > # Only if the value (or the portion preceding > # the first colon) is equal to the map key. > next unless ($value eq $id || $value =~ /^\Q$id\E:/); > $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ >/\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); > } > >I?m guessing there may be a similar problem with map_fasta_ids? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From caigh02 at gmail.com Mon May 19 21:43:18 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Mon, 19 May 2014 23:43:18 -0400 Subject: [maker-devel] Maker exon number Message-ID: Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue May 20 14:34:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 May 2014 20:34:20 +0000 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Guohong, What version of MAKER are you running? Thanks, Daniel On May 19, 2014, at 9:43 PM, Guohong Cai wrote: > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 20 14:50:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 14:50:44 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I can do that. Just a note of caution though. The ID= attribute is not protected (it's just an identifier to relate things to one another for correct parentage). Downstream scripts that use or manipulate GFF3 files can change it (so relying on it to always be the same or even be informative is not guaranteed). --Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 20 18:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 18:52:34 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I've gone ahead and made the change in the devlopment version. It will probably be convenient in most cases, but it's important to note one caveat. Exon features are shared in GFF3 format. So if there are multiple isoforms that contain the same exon, there will only be a single exon line in the GFF3, but it will list several transcript IDs in it's Parent= attribute. What does that have to do with with the ID= attribute or exon order? Well it means that ID=exon:2 in the first transcript may be the second exon, but in another transcript ID=exon:2 may be the first exon or third exon, etc. This is because there is only a single line for a given exon and it gets shared by all the transcripts. So it will always have the same ID= tag, but will hold a different position in different isoforms (so it's ordinal value will not go along with the ID in those cases). But since most gene calls from MAKER will have only one isoform (default) it could still be convenient in those cases. Thanks, Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 07:14:40 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 08:14:40 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Daniel, I am using maker-2.31.5.---Guohong On Tue, May 20, 2014 at 3:34 PM, Daniel Ence wrote: > Hi Guohong, > > What version of MAKER are you running? > > Thanks, > Daniel > > > On May 19, 2014, at 9:43 PM, Guohong Cai > wrote: > > > Hi Carson, > > > > I am using MAKER to annotate a few small genomes. When looking through > the gff file, I notice that the exon numbers do not start from 0 or 1 for > each gene. Only the first gene in a scaffold start with exon 0. If the > first gene has 3 exons (0-2), then the second gene will start from exon 3 > (an example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > > > Guohong > > > > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 08:40:47 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 09:40:47 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Thanks a lot.---Guohong On Tue, May 20, 2014 at 7:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 21:16:52 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 23:16:52 -0400 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Carson, is the development version available for download? Only maker2.31.5 is available on Yandell Lab website.---Guohong On Tue, May 20, 2014 at 8:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Thu May 22 23:13:37 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Thu, 22 May 2014 22:13:37 -0700 Subject: [maker-devel] Alternative splicing options Message-ID: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri May 23 08:55:50 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 May 2014 14:55:50 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 09:07:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 09:07:26 -0600 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto wrote: > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know the > ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. I > could not find many discussions in this group, and most genome publications > using Maker are unclear about whether they annotated alternative transcrips, > so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function not > well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged here), > what is the best approach to take, considering I already obtained what I > considered is a solid set of gene models after four rounds of annotation? > Should I start over by turning on alt_splice, and training gene predictors > from those outputs? Or would it be appropriate to simply repeat my latest > round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 09:56:27 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 08:56:27 -0700 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hey guys, Great to hear!! I will be anxious to try it out. Thanks for your prompt help! Cheers, Felipe On Fri, May 23, 2014 at 8:07 AM, Carson Holt wrote: > I'd like to add that alternate splice forms will be generated off of the > mutually exclusive EST evidence, so how well it performs as well as whether > or not it can even generates other splice forms will depend entirely on the > quality of your EST evidence. > > --Carson > > > From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options > > Hi Felipe, > > The alternative splice option is full-developed and functional option in > MAKER. What it does is tell MAKER to consider gene models with mutually > exclusive evidence. For example, if there are two models at a locus and > evidence that supports one exon in one model and a different exon in > another model, both those models might make it into the final geneset. > > From the workflow you described, I think you'd have to redo only the > fourth and final round of MAKER annotation. As a general principle for > trying out new options on your annotations, I'd recommend choosing a big > scaffold, running it with alt_splice=1, and seeing how you like the > results. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: > > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know > the ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. > I could not find many discussions in this group, and most genome > publications using Maker are unclear about whether they annotated > alternative transcrips, so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function > not well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged > here), what is the best approach to take, considering I already obtained > what I considered is a solid set of gene models after four rounds of > annotation? Should I start over by turning on alt_splice, and training > gene predictors from those outputs? Or would it be appropriate to simply > repeat my latest round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 23 10:21:38 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 23 May 2014 16:21:38 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: <14271D2B-4D83-47C9-8661-682599E94E8F@illinois.edu> That is exactly what I have seen using this option; genes with very good transcriptome evidence (as one might expect)tend to have more isoforms. The problem we run into is not having a diverse enough transcriptome set to work with (ours tend to be tissue-specific unfortunately), so we have some genes giving more isoforms than others, but we don?t design the libraries so have no control over it. We are currently only using Trinity assemblies as input over using TopHat2/Cufflinks. chris On May 23, 2014, at 10:07 AM, Carson Holt > wrote: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 14:31:36 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 13:31:36 -0700 Subject: [maker-devel] gff3_merge on models only for SNAP training? Message-ID: Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 14:33:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 14:33:17 -0600 Subject: [maker-devel] gff3_merge on models only for SNAP training? In-Reply-To: References: Message-ID: Yes. It's ok. Non-genic feature lines will be ignored. --Carson From: Felipe Barreto Date: Friday, May 23, 2014 at 2:31 PM To: MAKER group Subject: [maker-devel] gff3_merge on models only for SNAP training? Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at imbim.uu.se Fri May 23 01:56:05 2014 From: jacques.dainat at imbim.uu.se (Jacques Dainat) Date: Fri, 23 May 2014 09:56:05 +0200 Subject: [maker-devel] Possible error in tRNA annotation by maker Message-ID: Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue May 27 02:12:07 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 27 May 2014 10:12:07 +0200 Subject: [maker-devel] Some questions regarding ab-initio training Message-ID: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Hi, I wanted to get some feedback regarding the training of ab-initio gene finders - it?s not strictly Maker related, but I suppose there are many people on this list that have encountered and solved this issue in one way or another. Specifically, I am trying to train Augustus (and possibly SNAP) for a plant genome. This has always been a very frustrating process for me, but while I have a better idea now how to do it, I still don?t get the sort of accuracy that I am hoping for. A quick run-through of my process; Evidence build with maker on level 1 and 2 proteins from Uniprot + Sanger-sequenced EST data Filtered for Models with an AED <= 0.3 Loaded that into WebApollo, together with an existing reference annotation and the evidence tracks Manually curated/selected 750 gene models using the following rules: - Must have start/stop codon - Most have as many exons as possible - Must agree with evidence - Must be >= 2kb part from other gene models (provided as flanking regions for augustus to train intergenic sequence) From these models, I created a GBK file, split it into 650 (train) and 100 (test) models and created a new profile using the documented procedure. But: While the naked ab-init models created through maker get a lot of genes ?sort of right?, I still see too many issues to be really satisfied. Problems include: - random exon calls which are not supported by any line of evidence (~1 per gene model, I would guess) - poor congruency with some gene models (especially ones not used for training/testing) Is there any best-practice guide on how to improve this? The Augustus website is unfortunately quite poor on detail? My impression so far is that ramping up the number of training models isn?t really doing too much beyond a certain point (tried 400, 500 and 750). Regards, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue May 27 09:25:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:25:39 -0600 Subject: [maker-devel] Some questions regarding ab-initio training In-Reply-To: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> References: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Message-ID: Extra exons can be required for predictors to make sense of a region (they do the best they can). This can be due to imperfect assemblies or repeats. For plants the repeat database is the the one thing that will most affect the annotation quality. You may need to spend some time building the best repeat library you can. The repeat library is the next most important thing next to training the predictor, because they confuse the predictor (sometimes a lot) causing it to behave oddly in those regions (because repeats do encode real protein and protein fragments). Also when running now with MAKER make sure to include the entire proteome of a related species and not just UniProt, and you will get better performance. Now that you have Augustus trained, using it inside of MAKER with an improved repeat library and additional protein evidence should give it the feedback that will allow it to perform better than it would with just naked ab initio prediction. Thanks, Carson On 5/27/14, 2:12 AM, "Marc H?ppner" wrote: >Hi, > >I wanted to get some feedback regarding the training of ab-initio gene >finders - it?s not strictly Maker related, but I suppose there are many >people on this list that have encountered and solved this issue in one >way or another. > >Specifically, I am trying to train Augustus (and possibly SNAP) for a >plant genome. This has always been a very frustrating process for me, but >while I have a better idea now how to do it, I still don?t get the sort >of accuracy that I am hoping for. A quick run-through of my process; > >Evidence build with maker on level 1 and 2 proteins from Uniprot + >Sanger-sequenced EST data > >Filtered for Models with an AED <= 0.3 > >Loaded that into WebApollo, together with an existing reference >annotation and the evidence tracks > >Manually curated/selected 750 gene models using the following rules: >- Must have start/stop codon >- Most have as many exons as possible >- Must agree with evidence >- Must be >= 2kb part from other gene models (provided as flanking >regions for augustus to train intergenic sequence) > >From these models, I created a GBK file, split it into 650 (train) and >100 (test) models and created a new profile using the documented >procedure. > >But: > >While the naked ab-init models created through maker get a lot of genes >?sort of right?, I still see too many issues to be really satisfied. >Problems include: > >- random exon calls which are not supported by any line of evidence (~1 >per gene model, I would guess) >- poor congruency with some gene models (especially ones not used for >training/testing) > >Is there any best-practice guide on how to improve this? The Augustus >website is unfortunately quite poor on detail? My impression so far is >that ramping up the number of training models isn?t really doing too much >beyond a certain point (tried 400, 500 and 750). > >Regards, > >Marc > > >Marc P. Hoeppner, PhD >Team Leader >BILS Genome Annotation Platform >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 27 09:26:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:26:25 -0600 Subject: [maker-devel] Possible error in tRNA annotation by maker In-Reply-To: References: Message-ID: Do you have a small test contig I could use to duplicate the error? That will make it easier to fix. Thanks, Carson From: Jacques Dainat Date: Friday, May 23, 2014 at 1:56 AM To: Subject: [maker-devel] Possible error in tRNA annotation by maker Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 01:28:14 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 09:28:14 +0200 Subject: [maker-devel] Problem with installation Message-ID: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 02:26:08 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 10:26:08 +0200 Subject: [maker-devel] General question Message-ID: I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.cantarel at gmail.com Wed May 28 05:11:18 2014 From: b.cantarel at gmail.com (Brandi Cantarel) Date: Wed, 28 May 2014 06:11:18 -0500 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Maker's predictions are improved with evidence. These proteins can be from uniprot (I recommend uniprot50) or from a closely related taxa. Maker uses comparisons to these proteins in its prediction. There is more detail on this in the paper. Sent from my iPhone > On May 28, 2014, at 3:26, Panos Ioannidis wrote: > > I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 05:29:43 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 13:29:43 +0200 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Thanks Brandi. On Wed, May 28, 2014 at 1:11 PM, Brandi Cantarel wrote: > Maker's predictions are improved with evidence. These proteins can be > from uniprot (I recommend uniprot50) or from a closely related taxa. > > Maker uses comparisons to these proteins in its prediction. There is more > detail on this in the paper. > > Sent from my iPhone > > On May 28, 2014, at 3:26, Panos Ioannidis > wrote: > > I'm going through the Maker tutorial and saw that among the input files > you give it, there's a fasta file with proteins (the protein=xxxparameter in the > maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. > proteins) and also annotates them. Does it only do annotation of already > predicted genes/proteins? But then, why is it using gene predictors like > Augustus, SNAP, etc? > > Thanks, > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 07:29:58 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 13:29:58 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 07:46:12 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 15:46:12 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence wrote: > Hi Panos, When you go to the src directory and type "./Build status", > what message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or > error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 08:03:33 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 14:03:33 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, So I just tried the commands that you used on my install of maker, and it took a surprisingly long time for the error messages to print. The test that we use in the tutorials (it seems to run faster than running maker with -h or with no options) is maker -CTL, which will create control files that you use to set the many options for maker. Try running ./maker -CTL and let me know whether it creates those files. I guess that it might take more or less time, depending on your machine. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 7:46 AM, Panos Ioannidis > wrote: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence > wrote: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 08:32:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 08:32:07 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis Cc: "" Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis wrote: > Hello Maker community, > > I just finished installing Maker and even though everything seems to be okay, > when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:13:05 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:13:05 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one > thing that happens when you first use a new module or script Is that the > interpreter follows the dependency tree validating that everything > executes/loads correctly. Since you installed a number of dependencies and > MAKER itself, the first time you launch MAKER Perl has to do this check on > the dependency tree. This only happens the first time, and after that Perl > remembers it already ran the check so the dependencies and MAKER will just > start from then on. Normally this proccess takes less than 30 seconds; > however, on some systems (especially clusters) there may a heavy IO burden > and this process can take a while. For example does it take a moment for > 'ls -al' to return in some directories rather than returning > instantaneously like it is supposed to? If it takes 3 seconds to return or > example, then each dependency check may take up to 3 seconds. If you just > installed a bunch of new perl modules then there may be a hundred or more > dependencies that may have to be validated for the first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:15:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:15:20 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:16:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:16:58 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: You may also want to look into if you need to reinstall perl on another drive. --Carson From: Carson Holt Date: Wednesday, May 28, 2014 at 10:15 AM To: Panos Ioannidis Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:25:04 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:25:04 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: "ls -al" is instantaneous in all directories... I'll try installing it on my workstation, although it's not possible to do annotation on my machine! And the machine I currently have installed it, is our server and I can't really make any big changes there. Anyway, I'll let you know how it goes. P On Wed, May 28, 2014 at 6:16 PM, Carson Holt wrote: > You may also want to look into if you need to reinstall perl on another > drive. > > --Carson > > > From: Carson Holt > Date: Wednesday, May 28, 2014 at 10:15 AM > To: Panos Ioannidis > > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 > seconds per query which is why you should do the 'ls -al' test), it can > take several minutes because it's an IO issue. > > --Carson > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:28:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:28:30 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Wed May 28 11:39:45 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 10:39:45 -0700 Subject: [maker-devel] Adding non-overlapping models to final set Message-ID: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 12:35:06 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 18:35:06 +0000 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: Message-ID: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 12:45:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 12:45:05 -0600 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: For convenience you can use the attached script to help pull out the match/match_part features you want from the GFF3 file (or you can pull them out yourself). Then do just like Daniel said by setting keep_preds=1 and giving the selected match/match_part features to pred_gf, and your current MAKER models to model_gff. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 12:35 PM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Adding non-overlapping models to final set Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto wrote: > Hi, all, > > I finished generating Maker gene models. Following suggestions here and from > publications, I used IPRscan on the set of non-ovelapping ab initio protein > models. This identified ~200 models with protein domains, and I would like to > add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to update > my final maker_genome.gff file to include these 200 models, without also > adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the 200 > new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional info > into the gff file. I want simply to generate an updated (and final) gene set > and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3236 bytes Desc: not available URL: From fbarreto at ucsd.edu Wed May 28 14:28:48 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 13:28:48 -0700 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: Awesome! Thanks for the tips and script. This should do the trick. Will come back if I get stuck. Felipe On Wed, May 28, 2014 at 11:45 AM, Carson Holt wrote: > For convenience you can use the attached script to help pull out the > match/match_part features you want from the GFF3 file (or you can pull them > out yourself). Then do just like Daniel said by setting keep_preds=1 and > giving the selected match/match_part features to pred_gf, and your current > MAKER models to model_gff. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 12:35 PM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Adding non-overlapping models to final set > > Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If > you don't care about getting the functional information from the IPRscan > report and into the gff file, then you just need to pull those predictions > out from all the ab-initio predictions that you don't care about and put > them in a fasta file. Then you put that file in for the "pred_gff" option > and set keep_preds=1. That will promote those predictions to full gene > models. Then you can merge with your other gff3 file. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: > > Hi, all, > > I finished generating Maker gene models. Following suggestions here and > from publications, I used IPRscan on the set of non-ovelapping ab initio > protein models. This identified ~200 models with protein domains, and I > would like to add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to > update my final maker_genome.gff file to include these 200 models, without > also adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the > 200 new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional > info into the gff file. I want simply to generate an updated (and final) > gene set and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Thu May 29 03:21:24 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Thu, 29 May 2014 11:21:24 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt wrote: > Try perlbrew to set up yor own local version of perl just for your user. > http://perlbrew.pl > > --Carson > > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 08:58:22 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 14:58:22 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, The results are stored in the datastore directory in the "maker.output" directory. You can merge those results into one gff file with the gff3_merge accessory script. It's included in the bin directory. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 3:21 AM, Panos Ioannidis > wrote: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt > wrote: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt > Cc: Daniel Ence >, ">" > Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt > wrote: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis > Cc: ">" > Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Thu May 29 13:15:39 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Thu, 29 May 2014 15:15:39 -0400 Subject: [maker-devel] maker gene order in gff output Message-ID: Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:37:24 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:37:24 -0400 Subject: [maker-devel] Question about 'keep_pred' setting Message-ID: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 14:44:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 20:44:28 +0000 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: Message-ID: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Hi Daniel, Your interpretation of the code is correct. keep_preds is a binary setting. There's been some discussion behind-the-scenes about making it more flexible, but that hasn't been implemented yet. We need to fix what it says in the control file. Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:47:47 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:47:47 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary > setting. There's been some discussion behind-the-scenes about making it > more flexible, but that hasn't been implemented yet. We need to fix what it > says in the control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > > Good afternoon! > > I have a quick question about the keep_pred setting in Maker. In older > versions of Maker, this was a binary value indicating whether unsupported > predictions should be kept. I'm now using Maker 2.31.3, where it's > described as a scaled value indicating a "concordance threshold" for > unsupported predictions. As far as I can tell from the code, however, it's > still treated in the same way as before. > > Could you briefly describe the motivation for this setting and the > intended (although possibly incomplete) change in its functionality in new > versions of Maker? > > Thanks! > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 15:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 15:43:35 -0600 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: There is a hidden score called abAED that measures concordance among the ab initio gene predictors . The idea was to have ab initio models that are the same across multiple ab initio predictor be kept if they're group concordance is high enough, then drop ab initio predictions that only happen in one ab initio predictor. Currently the option is all or nothing, the threshold would give a more fine grained control of keeping just some unsupported predictions. --Carson From: Daniel Standage Date: Thursday, May 29, 2014 at 2:47 PM To: Daniel Ence Cc: Maker Mailing List Subject: Re: [maker-devel] Question about 'keep_pred' setting Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary setting. > There's been some discussion behind-the-scenes about making it more flexible, > but that hasn't been implemented yet. We need to fix what it says in the > control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's described >> as a scaled value indicating a "concordance threshold" for unsupported >> predictions. As far as I can tell from the code, however, it's still treated >> in the same way as before. >> >> Could you briefly describe the motivation for this setting and the intended >> (although possibly incomplete) change in its functionality in new versions of >> Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 16:29:39 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 18:29:39 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Ah, that makes sense. Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 5:43 PM, Carson Holt wrote: > There is a hidden score called abAED that measures concordance among the > ab initio gene predictors . The idea was to have ab initio models that are > the same across multiple ab initio predictor be kept if they're group > concordance is high enough, then drop ab initio predictions that only > happen in one ab initio predictor. Currently the option is all or nothing, > the threshold would give a more fine grained control of keeping just some > unsupported predictions. > > --Carson > > > From: Daniel Standage > Date: Thursday, May 29, 2014 at 2:47 PM > To: Daniel Ence > Cc: Maker Mailing List > Subject: Re: [maker-devel] Question about 'keep_pred' setting > > Thanks. > > Just curious: how would the intended behavior differ if keep_pred was set > to, say, 0.5, instead of 0 or 1? > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, May 29, 2014 at 4:44 PM, Daniel Ence > wrote: > >> Hi Daniel, >> >> Your interpretation of the code is correct. keep_preds is a binary >> setting. There's been some discussion behind-the-scenes about making it >> more flexible, but that hasn't been implemented yet. We need to fix what it >> says in the control file. >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 29, 2014, at 2:37 PM, Daniel Standage >> wrote: >> >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's >> described as a scaled value indicating a "concordance threshold" for >> unsupported predictions. As far as I can tell from the code, however, it's >> still treated in the same way as before. >> >> Could you briefly describe the motivation for this setting and the >> intended (although possibly incomplete) change in its functionality in new >> versions of Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 21:11:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 21:11:11 -0600 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: The maker_map_ids script that comes with MAKER can be used to generate new names of the style PREFIX###### or PREFIX_######. You can use the --sort_order flag to sort the contigs in whatever your preferred order is before generating the new names. Then use the map_gff_ids and map_fasta_ids to change the names in the gff3 and fasta files respectively. Here is some extra information from a tutorial where the maker_map_ids script is used --> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_G MOD_Online_Training_2014#Post_Processing_of_Annotations --Carson From: Guohong Cai Date: Thursday, May 29, 2014 at 1:15 PM To: "" Subject: [maker-devel] maker gene order in gff output Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Fri May 30 05:40:17 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Fri, 30 May 2014 06:40:17 -0500 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: Great????Guohong On Thu, May 29, 2014 at 10:11 PM, Carson Holt wrote: > The maker_map_ids script that comes with MAKER can be used to generate new > names of the style PREFIX###### or PREFIX_######. You can use > the --sort_order flag to sort the contigs in whatever your preferred order > is before generating the new names. > > Then use the map_gff_ids and map_fasta_ids to change the names in the > gff3 and fasta files respectively. > > Here is some extra information from a tutorial where the maker_map_ids > script is used --> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations > > --Carson > > > From: Guohong Cai > Date: Thursday, May 29, 2014 at 1:15 PM > To: "" > Subject: [maker-devel] maker gene order in gff output > > Hi Carson, > > In the maker output, the genes have names like "genemark-scaffold17- > processed-gene-0.0". Many users probably will eventually give the genes > different names, such as GSGxxx (Genus Species Gene #). > > In the gff output, the scaffolds are not in order (either numerical order > or the order of input assembly). On the same scaffold, the genes are not > listed in order either. This will make it a little harder for users to > change the gene IDs. We may name the genes in order from scaffold 1 to > scaffold N, and and each scaffold, order the genes from left to right, e.g > GSG00001, GSG00002). Do you think you can order the genes in the gff > output? For example, order the scaffolds according to the input genome > assembly, and on each scaffold, order the genes from 5' to 3'. > > Thanks. > > Guohong > Rutgers University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Sat May 31 09:23:23 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Sat, 31 May 2014 11:23:23 -0400 Subject: [maker-devel] Precomputed alignments Message-ID: Hello again! About a year ago I asked about using precomputed alignments with Maker. The thread quickly took a different direction as we tried to track down other issues, and I never got the thread back on its original track. So, to return to the original question, what exactly is required when providing pre-computed alignments in GFF3 format? For example, does it affect Maker's behavior whether a score is given? The "Target" attribute? The "Gap" attribute? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdelmore at zoology.ubc.ca Thu May 1 09:06:27 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Thu, 1 May 2014 08:06:27 -0700 Subject: [maker-devel] problem with dsindex Message-ID: Hi Carson, I wanted to confirm that the interproscan scripts provided in maker are now compatible with version 5 of the program and ask if there was any additional documentation for the use of iprscan_wrap. It looks like that script will run interproscan for us but I'm not sure what to supply on the command line. I could also run interproscan directory but am wondering if you have any suggestions for what to include on the command line, as this has changed in the new version. This is what I would propose: ./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup Thanks, Kira From carsonhh at gmail.com Fri May 2 12:18:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:18:04 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: References: Message-ID: The scripts that use interproscan output should work with version 5 (iprscan2gff3, ipr_update_gff, etc.). But scripts that wrap interproscan and run it for you like iprscan_wrap only work with version 4. Thanks, Carson On 5/1/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Hi Carson, > >I wanted to confirm that the interproscan scripts provided in maker are >now compatible with version 5 of the program and ask if there was any >additional documentation for the use of iprscan_wrap. It looks like that >script will run interproscan for us but I'm not sure what to supply on the >command line. > >I could also run interproscan directory but am wondering if you have any >suggestions for what to include on the command line, as this has changed >in the new version. This is what I would propose: > >./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup > >Thanks, >Kira > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri May 2 12:55:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:55:27 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Whichever has the best AED score I believe, but you can add gene_id= to the header of each fasta file to ensure MAKER doesn't try and cluster unrelated transcripts into a single gene. Then the transcript name and gene name will be guaranteed to match up. --Carson From: Shaun Jackman Date: Wednesday, April 30, 2014 at 5:25 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] est_forward and conflicting names Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 13:40:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 12:40:42 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are conventionally named according to the amino acid and anticodon, such as trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names with that convention? On 6 March 2014 12:58, Carson Holt wrote: Yes. I?ll fix the naming. > > Thanks, > Carson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 13:50:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 13:50:23 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should already be fixed in the current 2.31.3 download. I'll also send you the subversion credentials in a separate e-mail. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 1:40 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? On 6 March 2014 12:58, Carson Holt wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 14:00:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 13:00:22 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also > send you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes > the naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified > genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the > names with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > > Yes. I?ll fix the naming. >> >> Thanks, >> Carson >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 14:14:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 14:14:11 -0600 Subject: [maker-devel] Mapping gene names Message-ID: I need to fix that last update tag. I did a point release, because there were a couple of very minor fixes that didn't justify a full release (tRNA naming and a fasta_merge bug for tRNAs - I think three lines total of code). There won't be another major version release for a while because we're working on MAKER-EVM which will be version 3.0 (joint project for full MAKER integration with EVM). So just point releases on 2.31 (which will be the very last version of MAKER2). I'll fix the static link and add an new one for 2.31.3. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 2:00 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31 reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also send > you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes the > naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? >> >> The integration of MAKER-P with tRNAscan is very useful. The identified genes >> are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are >> conventionally named according to the amino acid and anticodon, such as >> trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names >> with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > >> Yes. I?ll fix the naming. >> >> Thanks, >> Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From cynsb1987 at gmail.com Sun May 4 19:58:33 2014 From: cynsb1987 at gmail.com (hueytyng) Date: Mon, 5 May 2014 11:58:33 +1000 Subject: [maker-devel] Non-unique top level ID Message-ID: Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 08:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 08:18:18 -0600 Subject: [maker-devel] Non-unique top level ID In-Reply-To: References: Message-ID: If you use gff3_merge with the -l flag, then it will check for non-unique ID's and give new IDs to make them unique. Also in general it is better just to use the cufflinks results and exclude tophat results as they tend to be very noisy and decrease the quality of the final models overall. Thanks, Carson From: hueytyng Date: Sunday, May 4, 2014 at 7:58 PM To: Subject: [maker-devel] Non-unique top level ID Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 10:48:30 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 11:48:30 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue Message-ID: Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 10:53:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 10:53:09 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of MAKER (2.31.3 is current). ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ --Carson From: Dave Messina Date: Monday, May 5, 2014 at 10:48 AM To: Subject: [maker-devel] MAKER / RepeatRunner configuration issue Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 12:05:54 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 13:05:54 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old > version of MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER > File Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:32:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:32:01 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: I can't reproduce your issue, so it is probably something about your system or environment. 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. 2. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. 3. Is your current working directory full. 4. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. 5. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). 6. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:44:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:44:11 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Could you give me the full output of this command --> df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A I'm really mostly interested in the mount information. Some non-traditional network storage implementations can induce odd behaviors (for example by not supporting operations like hard links, etc.). --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 13:53:58 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 14:53:58 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A > Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Is you /tmp directory full (or whatever you have $TMPDIR > environmental variable is set to). Use 'df -h /tmp' to check. > > $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Are you running in a directory on an NFS drive? Is it true NFS or > is it something like FUSE. > > Same error on true NFS or on local disk. > > 1. Is your current working directory full. > > No. > > 1. Are you setting TMP= in the control files to either an NFS mounted > location or an in memory mounted location. Same issue if you are setting > the system's TMPDIR environmental variable to one of these. > > I tried setting it to /tmp just to be sure (no difference). > > 1. Is your default /tmp directory in fact locally mounted (some > clusters set this to in memory scratch). > > Yes. > > 1. Even though you already checked, humor me and run this exact > command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx > -version > > $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 14:00:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 14:00:57 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 16:34:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 16:34:14 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Mon May 5 18:09:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Mon, 5 May 2014 17:09:41 -0700 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Message-ID: Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Mon May 5 23:06:25 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 6 May 2014 05:06:25 +0000 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365FB90A5@mxb2.hg.genetics.utah.edu> you are the Man, Carson. --mark ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, May 05, 2014 4:34 PM To: Dave Messina Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt > Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina > Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt > wrote: df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. Same error on true NFS or on local disk. 1. Is your current working directory full. No. 1. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). 1. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). Yes. 1. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave From kdelmore at zoology.ubc.ca Mon May 5 22:36:41 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Mon, 5 May 2014 21:36:41 -0700 Subject: [maker-devel] iprscan and ipr_update_gff Message-ID: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Hi, I have a question about the interproscan scripts available with maker. I'm following the recommendations posted by Carson in Aug 2011 to incorporate results from iprscan. I'm getting quite a few warning messages with ipr_update_gff; they're all the same and suggest that there's no value for $name. When I look through the updated gff, however, the dbxrefs have been added. Is this something I should be worried about? I'm using iprscan version 5 and actually get some warning messages there as well but again, the output looks alright. In addition, some of my fastas don't get these warnings in iprscan and they still give me the error with ipr_update_gff so I don't think that's the problem. I'm using proteins from UniProt. My commands and errors are below. I've also attached the first 20000 lines from my initial gff and raw file from iprscan. Thanks, I really appreciate your continued support. Kira ### commands for interproscan scripts available in maker iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace error after last step (just an example, a ton of similar lines): Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. ### commands for interproscan 5 interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > interpro_6.out 2>&1 interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml error after first step: 04/05/2014 19:22:09:269 25% completed 04/05/2014 21:27:36:305 50% completed 04/05/2014 21:32:34:236 75% completed 04/05/2014 21:38:01:379 90% completed 2014-05-04 21:50:22,761 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959/jobPIRSF-2.84 2014-05-04 21:50:22,908 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:253] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959 04/05/2014 21:50:23:380 100% done: InterProScan analyses completed error after second step: interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml 05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... 2014-05-05 21:04:00,603 [uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run completion, unable to delete temporary directory /home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_210353293_gsjh -------------- next part -------------- A non-text attachment was scrubbed... Name: 6.maker.proteins.fasta.xml.raw Type: application/octet-stream Size: 1098375 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6_first20000.gff Type: application/octet-stream Size: 2880873 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 6 08:31:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:31:55 -0600 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 In-Reply-To: References: Message-ID: Nothing in the scoring or gene selection has changed. Changes are: Fix trnascan naming so codon is included in name Fix fgenesh parsing when used with correct_est_fusion Fix final ID bug when '/' character used in GFF3 input ID. Fix a start codon issue that could come up under when the right set of parameters were used (primarily correct_est_fusion and protein2genome). If you can provide both gff3 outputs form comparison, I could probably tell you why. Set up both runs to make sure that settings are indeed identical. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Monday, May 5, 2014 at 6:09 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 6 08:57:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:57:04 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: You have entries in your interproscan output that aren't in your GFF3. Is your GFF3 file truncated? --Carson On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" wrote: >Hi, I have a question about the interproscan scripts available with maker. > >I'm following the recommendations posted by Carson in Aug 2011 to >incorporate results from iprscan. I'm getting quite a few warning messages >with ipr_update_gff; they're all the same and suggest that there's no >value for $name. When I look through the updated gff, however, the dbxrefs >have been added. Is this something I should be worried about? > >I'm using iprscan version 5 and actually get some warning messages there >as well but again, the output looks alright. In addition, some of my >fastas don't get these warnings in iprscan and they still give me the >error with ipr_update_gff so I don't think that's the problem. I'm using >proteins from UniProt. My commands and errors are below. I've also >attached the first 20000 lines from my initial gff and raw file from >iprscan. > >Thanks, I really appreciate your continued support. >Kira > >### > >commands for interproscan scripts available in maker >iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace > >error after last step (just an example, a ton of similar lines): >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. > > >### > >commands for interproscan 5 >interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >interpro_6.out 2>&1 >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml > >error after first step: >04/05/2014 19:22:09:269 25% completed >04/05/2014 21:27:36:305 50% completed >04/05/2014 21:32:34:236 75% completed >04/05/2014 21:38:01:379 90% completed >2014-05-04 21:50:22,761 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >248] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959/jobPIRSF-2.84 >2014-05-04 21:50:22,908 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >253] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959 >04/05/2014 21:50:23:380 100% done: InterProScan analyses completed > >error after second step: >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >2014-05-05 21:04:00,603 >[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >completion, unable to delete temporary directory >/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >10353293_gsjh_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From kdelmore at zoology.ubc.ca Tue May 6 09:06:56 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:06:56 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Thanks for your reply. I have not truncated the gff3. I'm using files from the datastore that were written at the same time so I'm not sure how that would happen. I split my multifasta before running it through maker and have not merged the gff or protein.fasta for iprscan. That wouldn't be the problem would it? > You have entries in your interproscan output that aren't in your GFF3. Is > your GFF3 file truncated? > > --Carson > > > On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" > wrote: > >>Hi, I have a question about the interproscan scripts available with >> maker. >> >>I'm following the recommendations posted by Carson in Aug 2011 to >>incorporate results from iprscan. I'm getting quite a few warning >> messages >>with ipr_update_gff; they're all the same and suggest that there's no >>value for $name. When I look through the updated gff, however, the >> dbxrefs >>have been added. Is this something I should be worried about? >> >>I'm using iprscan version 5 and actually get some warning messages there >>as well but again, the output looks alright. In addition, some of my >>fastas don't get these warnings in iprscan and they still give me the >>error with ipr_update_gff so I don't think that's the problem. I'm using >>proteins from UniProt. My commands and errors are below. I've also >>attached the first 20000 lines from my initial gff and raw file from >>iprscan. >> >>Thanks, I really appreciate your continued support. >>Kira >> >>### >> >>commands for interproscan scripts available in maker >>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >> -inplace >> >>error after last step (just an example, a ton of similar lines): >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. >> >> >>### >> >>commands for interproscan 5 >>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>interpro_6.out 2>&1 >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >> >>error after first step: >>04/05/2014 19:22:09:269 25% completed >>04/05/2014 21:27:36:305 50% completed >>04/05/2014 21:32:34:236 75% completed >>04/05/2014 21:38:01:379 90% completed >>2014-05-04 21:50:22,761 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>248] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959/jobPIRSF-2.84 >>2014-05-04 21:50:22,908 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>253] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959 >>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >> >>error after second step: >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>2014-05-05 21:04:00,603 >>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>completion, unable to delete temporary directory >>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >>10353293_gsjh_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue May 6 09:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:09:13 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: The file you sent was missing the ##FASTA entry and all sequence at the bottom for example. Is that the way it is in the datastore? --Carson On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Thanks for your reply. I have not truncated the gff3. I'm using files from >the datastore that were written at the same time so I'm not sure how that >would happen. I split my multifasta before running it through maker and >have not merged the gff or protein.fasta for iprscan. That wouldn't be the >problem would it? > >> You have entries in your interproscan output that aren't in your GFF3. >>Is >> your GFF3 file truncated? >> >> --Carson >> >> >> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Hi, I have a question about the interproscan scripts available with >>> maker. >>> >>>I'm following the recommendations posted by Carson in Aug 2011 to >>>incorporate results from iprscan. I'm getting quite a few warning >>> messages >>>with ipr_update_gff; they're all the same and suggest that there's no >>>value for $name. When I look through the updated gff, however, the >>> dbxrefs >>>have been added. Is this something I should be worried about? >>> >>>I'm using iprscan version 5 and actually get some warning messages there >>>as well but again, the output looks alright. In addition, some of my >>>fastas don't get these warnings in iprscan and they still give me the >>>error with ipr_update_gff so I don't think that's the problem. I'm using >>>proteins from UniProt. My commands and errors are below. I've also >>>attached the first 20000 lines from my initial gff and raw file from >>>iprscan. >>> >>>Thanks, I really appreciate your continued support. >>>Kira >>> >>>### >>> >>>commands for interproscan scripts available in maker >>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>> -inplace >>> >>>error after last step (just an example, a ton of similar lines): >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15242. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15353. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15674. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15776. >>> >>> >>>### >>> >>>commands for interproscan 5 >>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>>interpro_6.out 2>&1 >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>> >>>error after first step: >>>04/05/2014 19:22:09:269 25% completed >>>04/05/2014 21:27:36:305 50% completed >>>04/05/2014 21:32:34:236 75% completed >>>04/05/2014 21:38:01:379 90% completed >>>2014-05-04 21:50:22,761 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>248] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959/jobPIRSF-2.84 >>>2014-05-04 21:50:22,908 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>253] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959 >>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>> >>>error after second step: >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>2014-05-05 21:04:00,603 >>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>completion, unable to delete temporary directory >>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>_2 >>>10353293_gsjh_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > From kdelmore at zoology.ubc.ca Tue May 6 09:26:07 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:26:07 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> I just printed the first 20000 lines of the gff to send to you because it was too large to send through email. I've included a dropbox link to the full file below. I've also included a link to the final gff with dbx refs; as I mentioned, it does seem to add them even with the error. If I run ipr_update_gff twice, I get the warnings on the first run but not on the second. Does that help diagnose the problem? The only other red flag I've encountered with maker was in including external gff3 from geneid and sgp2. These gff3s failed validation at the website suggested the the README file, with the warning message "cds: non-unique id" for all cds, but maker didn't give me a warning and they seem to be incorporated into the annotation fine. original gff https://www.dropbox.com/s/nimoh605jdk9myx/6.gff final gff https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta Thanks again for getting back to me. > The file you sent was missing the ##FASTA entry and all sequence at the > bottom for example. Is that the way it is in the datastore? > > --Carson > > > On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" > wrote: > >>Thanks for your reply. I have not truncated the gff3. I'm using files >> from >>the datastore that were written at the same time so I'm not sure how that >>would happen. I split my multifasta before running it through maker and >>have not merged the gff or protein.fasta for iprscan. That wouldn't be >> the >>problem would it? >> >>> You have entries in your interproscan output that aren't in your GFF3. >>>Is >>> your GFF3 file truncated? >>> >>> --Carson >>> >>> >>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>> >>> wrote: >>> >>>>Hi, I have a question about the interproscan scripts available with >>>> maker. >>>> >>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>incorporate results from iprscan. I'm getting quite a few warning >>>> messages >>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>value for $name. When I look through the updated gff, however, the >>>> dbxrefs >>>>have been added. Is this something I should be worried about? >>>> >>>>I'm using iprscan version 5 and actually get some warning messages >>>> there >>>>as well but again, the output looks alright. In addition, some of my >>>>fastas don't get these warnings in iprscan and they still give me the >>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>> using >>>>proteins from UniProt. My commands and errors are below. I've also >>>>attached the first 20000 lines from my initial gff and raw file from >>>>iprscan. >>>> >>>>Thanks, I really appreciate your continued support. >>>>Kira >>>> >>>>### >>>> >>>>commands for interproscan scripts available in maker >>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>> -inplace >>>> >>>>error after last step (just an example, a ton of similar lines): >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15242. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15353. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15674. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15776. >>>> >>>> >>>>### >>>> >>>>commands for interproscan 5 >>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>> > >>>>interpro_6.out 2>&1 >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>> >>>>error after first step: >>>>04/05/2014 19:22:09:269 25% completed >>>>04/05/2014 21:27:36:305 50% completed >>>>04/05/2014 21:32:34:236 75% completed >>>>04/05/2014 21:38:01:379 90% completed >>>>2014-05-04 21:50:22,761 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>248] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959/jobPIRSF-2.84 >>>>2014-05-04 21:50:22,908 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>253] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959 >>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>> >>>>error after second step: >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>2014-05-05 21:04:00,603 >>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>completion, unable to delete temporary directory >>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>>_2 >>>>10353293_gsjh_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > > From carsonhh at gmail.com Tue May 6 09:47:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:47:23 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Ok. With the full file I can see what what was causing the message. It is a parsing bug that was happening in a few cases, and I've now fixed it. But you can ignore it, because it has no effect on the output. It would only be an issue if the ID= and Name= tags were different in the GFF3 for the gene feature lines (which is never be true for MAKER's output). It was correctly parsing the 'mRNA' Name and ID tags, but was sometimes having issue with the Name= tags for the 'gene' lines (but because they are redundant with ID= tag, the script still finds what it needs to add the Dbxref= tags). --Carson On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" wrote: >I just printed the first 20000 lines of the gff to send to you because it >was too large to send through email. I've included a dropbox link to the >full file below. I've also included a link to the final gff with dbx refs; >as I mentioned, it does seem to add them even with the error. If I run >ipr_update_gff twice, I get the warnings on the first run but not on the >second. Does that help diagnose the problem? > >The only other red flag I've encountered with maker was in including >external gff3 from geneid and sgp2. These gff3s failed validation at the >website suggested the the README file, with the warning message "cds: >non-unique id" for all cds, but maker didn't give me a warning and they >seem to be incorporated into the annotation fine. > >original gff >https://www.dropbox.com/s/nimoh605jdk9myx/6.gff > >final gff >https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta > >Thanks again for getting back to me. > >> The file you sent was missing the ##FASTA entry and all sequence at the >> bottom for example. Is that the way it is in the datastore? >> >> --Carson >> >> >> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Thanks for your reply. I have not truncated the gff3. I'm using files >>> from >>>the datastore that were written at the same time so I'm not sure how >>>that >>>would happen. I split my multifasta before running it through maker and >>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>> the >>>problem would it? >>> >>>> You have entries in your interproscan output that aren't in your GFF3. >>>>Is >>>> your GFF3 file truncated? >>>> >>>> --Carson >>>> >>>> >>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>> >>>> wrote: >>>> >>>>>Hi, I have a question about the interproscan scripts available with >>>>> maker. >>>>> >>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>> messages >>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>value for $name. When I look through the updated gff, however, the >>>>> dbxrefs >>>>>have been added. Is this something I should be worried about? >>>>> >>>>>I'm using iprscan version 5 and actually get some warning messages >>>>> there >>>>>as well but again, the output looks alright. In addition, some of my >>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>> using >>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>iprscan. >>>>> >>>>>Thanks, I really appreciate your continued support. >>>>>Kira >>>>> >>>>>### >>>>> >>>>>commands for interproscan scripts available in maker >>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>> -inplace >>>>> >>>>>error after last step (just an example, a ton of similar lines): >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15242. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15353. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15674. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15776. >>>>> >>>>> >>>>>### >>>>> >>>>>commands for interproscan 5 >>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>>> > >>>>>interpro_6.out 2>&1 >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>> >>>>>error after first step: >>>>>04/05/2014 19:22:09:269 25% completed >>>>>04/05/2014 21:27:36:305 50% completed >>>>>04/05/2014 21:32:34:236 75% completed >>>>>04/05/2014 21:38:01:379 90% completed >>>>>2014-05-04 21:50:22,761 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>248] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959/jobPIRSF-2.84 >>>>>2014-05-04 21:50:22,908 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>253] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959 >>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>> >>>>>error after second step: >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>2014-05-05 21:04:00,603 >>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>completion, unable to delete temporary directory >>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_201405 >>>>>05 >>>>>_2 >>>>>10353293_gsjh_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>>> >>> >>> >> >> >> > > From carsonhh at gmail.com Tue May 6 09:54:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:54:41 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Actually looking a little closer, it wouldn't even matter if the ID= and Name= tags were different for the 'gene', because interproscan gives the results for the transcripts (mRNA) and not the gene. So Dbxref still gets populated correctly reguardless. --Carson On 5/6/14, 9:47 AM, "Carson Holt" wrote: >Ok. With the full file I can see what what was causing the message. It is >a parsing bug that was happening in a few cases, and I've now fixed it. >But you can ignore it, because it has no effect on the output. > >It would only be an issue if the ID= and Name= tags were different in the >GFF3 for the gene feature lines (which is never be true for MAKER's >output). It was correctly parsing the 'mRNA' Name and ID tags, but was >sometimes having issue with the Name= tags for the 'gene' lines (but >because they are redundant with ID= tag, the script still finds what it >needs to add the Dbxref= tags). > >--Carson > > >On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" >wrote: > >>I just printed the first 20000 lines of the gff to send to you because it >>was too large to send through email. I've included a dropbox link to the >>full file below. I've also included a link to the final gff with dbx >>refs; >>as I mentioned, it does seem to add them even with the error. If I run >>ipr_update_gff twice, I get the warnings on the first run but not on the >>second. Does that help diagnose the problem? >> >>The only other red flag I've encountered with maker was in including >>external gff3 from geneid and sgp2. These gff3s failed validation at the >>website suggested the the README file, with the warning message "cds: >>non-unique id" for all cds, but maker didn't give me a warning and they >>seem to be incorporated into the annotation fine. >> >>original gff >>https://www.dropbox.com/s/nimoh605jdk9myx/6.gff >> >>final gff >>https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta >> >>Thanks again for getting back to me. >> >>> The file you sent was missing the ##FASTA entry and all sequence at the >>> bottom for example. Is that the way it is in the datastore? >>> >>> --Carson >>> >>> >>> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >>> wrote: >>> >>>>Thanks for your reply. I have not truncated the gff3. I'm using files >>>> from >>>>the datastore that were written at the same time so I'm not sure how >>>>that >>>>would happen. I split my multifasta before running it through maker and >>>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>>> the >>>>problem would it? >>>> >>>>> You have entries in your interproscan output that aren't in your >>>>>GFF3. >>>>>Is >>>>> your GFF3 file truncated? >>>>> >>>>> --Carson >>>>> >>>>> >>>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>>> >>>>> wrote: >>>>> >>>>>>Hi, I have a question about the interproscan scripts available with >>>>>> maker. >>>>>> >>>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>>> messages >>>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>>value for $name. When I look through the updated gff, however, the >>>>>> dbxrefs >>>>>>have been added. Is this something I should be worried about? >>>>>> >>>>>>I'm using iprscan version 5 and actually get some warning messages >>>>>> there >>>>>>as well but again, the output looks alright. In addition, some of my >>>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>>> using >>>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>>iprscan. >>>>>> >>>>>>Thanks, I really appreciate your continued support. >>>>>>Kira >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan scripts available in maker >>>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>>> -inplace >>>>>> >>>>>>error after last step (just an example, a ton of similar lines): >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15242. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15353. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15674. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15776. >>>>>> >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan 5 >>>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup >>>>>>\ >>>>>> > >>>>>>interpro_6.out 2>&1 >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>> >>>>>>error after first step: >>>>>>04/05/2014 19:22:09:269 25% completed >>>>>>04/05/2014 21:27:36:305 50% completed >>>>>>04/05/2014 21:32:34:236 75% completed >>>>>>04/05/2014 21:38:01:379 90% completed >>>>>>2014-05-04 21:50:22,761 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>248] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959/jobPIRSF-2.84 >>>>>>2014-05-04 21:50:22,908 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>253] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959 >>>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>>> >>>>>>error after second step: >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>>2014-05-05 21:04:00,603 >>>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>>completion, unable to delete temporary directory >>>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140 >>>>>>5 >>>>>>05 >>>>>>_2 >>>>>>10353293_gsjh_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > From sjackman at gmail.com Thu May 8 16:26:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:26:34 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun *http://sjackman.ca * On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to > the header of each fasta file to ensure MAKER doesn't try and cluster > unrelated transcripts into a single gene. Then the transcript name and > gene name will be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of > these reference sequences have sensible and consistent gene names, and so > I?m using est_forward to retain the gene names. This workflow is working > well for me. Some of the genes pulled in from GenBank have less useful > names like orf1234 or other numeric IDs. When multiple evidence sequences > map to the same location, how does est_forward choose which name to use? > If it?s chosen arbitrarily, could it be possible to choose the most common > name instead? > > Thanks, > Shaun > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:33:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:33:36 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: When moving transcripts onto a new assembly, you may have multiple transcripts of the same gene. Because your transcript name should be your fasta ID there is no way for MAKER to know that they go together when moving the models forward, so you can use the gene= option to make MAKER aware that these belong to the same genes. They will be grouped and you recover all splice forms as a group. Example: >SMEDT_00004 gene=dpp AAAAAAA >SMEDT_00005 gene=dpp AAAAAAA --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:26 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun http://sjackman.ca On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to the > header of each fasta file to ensure MAKER doesn't try and cluster unrelated > transcripts into a single gene. Then the transcript name and gene name will > be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of these > reference sequences have sensible and consistent gene names, and so I?m using > est_forward to retain the gene names. This workflow is working well for me. > Some of the genes pulled in from GenBank have less useful names like orf1234 > or other numeric IDs. When multiple evidence sequences map to the same > location, how does est_forward choose which name to use? If it?s chosen > arbitrarily, could it be possible to choose the most common name instead? > > Thanks, > Shaun > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu May 8 16:41:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:41:41 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun *http://sjackman.ca * On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple > transcripts of the same gene. Because your transcript name should be your > fasta ID there is no way for MAKER to know that they go together when > moving the models forward, so you can use the gene= option to make MAKER > aware that these belong to the same genes. They will be grouped and you > recover all splice forms as a group. > > Example: > > >SMEDT_00004 gene=dpp > AAAAAAA > > >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the > header of the FASTA file? I?m not clear on what you mean by this. In the > FASTA header, what portion is the transcript name, and what portion is the > gene name? > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 2 May 2014 11:55, Carson Holt wrote: > >> Whichever has the best AED score I believe, but you can add gene_id= to >> the header of each fasta file to ensure MAKER doesn't try and cluster >> unrelated transcripts into a single gene. Then the transcript name and >> gene name will be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which >> I?m using with est and protein to annotate a plant mitochondrion. Most >> of these reference sequences have sensible and consistent gene names, and >> so I?m using est_forward to retain the gene names. This workflow is >> working well for me. Some of the genes pulled in from GenBank have less >> useful names like orf1234 or other numeric IDs. When multiple evidence >> sequences map to the same location, how does est_forward choose which >> name to use? If it?s chosen arbitrarily, could it be possible to choose the >> most common name instead? >> >> Thanks, >> Shaun >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:43:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:43:40 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Only if you were to remove the brackets around gene=. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:41 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun http://sjackman.ca On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple transcripts > of the same gene. Because your transcript name should be your fasta ID there > is no way for MAKER to know that they go together when moving the models > forward, so you can use the gene= option to make MAKER aware that these belong > to the same genes. They will be grouped and you recover all splice forms as a > group. > > Example: > >> >SMEDT_00004 gene=dpp > AAAAAAA > >> >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the header of > the FASTA file? I?m not clear on what you mean by this. In the FASTA header, > what portion is the transcript name, and what portion is the gene name? > > Cheers, > Shaun > > > http://sjackman.ca > > > On 2 May 2014 11:55, Carson Holt wrote: >> Whichever has the best AED score I believe, but you can add gene_id= to the >> header of each fasta file to ensure MAKER doesn't try and cluster unrelated >> transcripts into a single gene. Then the transcript name and gene name will >> be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m >> using with est and protein to annotate a plant mitochondrion. Most of these >> reference sequences have sensible and consistent gene names, and so I?m using >> est_forward to retain the gene names. This workflow is working well for me. >> Some of the genes pulled in from GenBank have less useful names like orf1234 >> or other numeric IDs. When multiple evidence sequences map to the same >> location, how does est_forward choose which name to use? If it?s chosen >> arbitrarily, could it be possible to choose the most common name instead? >> >> Thanks, >> Shaun >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:07:52 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:07:52 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. Cheers, Shaun On 4 March 2014 18:33, Carson Holt wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is > extremely messy (non-trivial problem in most organisms with high false > positive rate), so MAKER for the most part doesn?t even try to do that. It > focuses only on the coding genes. You can now use tRNAscan and snoscan in > the newest version for some non-coding RNA support (those features were > only added a couple of months ago). So just like other prediction tools > (snap, augustus etc.), the primary focus has always been the coding genes. > We?ve only started adding non-coding RNA support recently for iPlant, so > it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks > for the tip. > > The rRNA genes that are found with est2genome have the feature type set to > *mRNA* and have corresponding *five_prime_UTR*, *CDS* and > *three_prime_UTR* features. Ideally the feature type would be set to > *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. > Is that a feature that you would be interested in adding to MAKER? The rRNA > gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is > standard, so determining the appropriate type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > On 27 February 2014 17:13, Carson Holt wrote: > >> Set single_exon=1, and the minimum size to a smaller value. I think it's >> set to 250 right now. Also est2genome is looking for ORF, so if there is >> none (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >> these hits? >> >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> >> Cheers, >> Shaun >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >> >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> Cheers, >>> Shaun >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>> wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before >>> passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>> What you can do is run it once with just est_forward=1 and >>> est2genome/protein2genome set to 1. Then take those results, pass them in >>> as model_gff and use the map_forward option to then filter the results >>> based on mRNA score and that would copy names onto new gene under the >>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>> separate tool that will map genes onto new assemblies (but under the hood >>> the tool will just be calling MAKER with certain parameters restricted). I >>> do this because if people commonly use it mixed with things like SNAP I can >>> start to get some very weird behaviors. >>> >>> Thanks, >>> Carson >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 3:04 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> It seems that this could be a very useful option in those cases where >>> you have firm a priori knowledge of the placement of ESTs. However, while >>> trying it I note that est_forward implies that the est2genome predictor is >>> turned on, implicitly. Is this necessary for this to work? I?m after the >>> behavior you describe below where exonerate is made to try really hard >>> within a limited region to align an est, but I would not like maker to >>> produce est2genome predictions. >>> >>> In general, I think this maker_coor and est_forward is a feature set >>> that is worthy to be promoted into a documented feature. >>> >>> THanks, >>> Mikael >>> >>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>> >>> It will still work without est_forward. It just works a little >>> differently. Keep in mind this was a hidden feature I used to find >>> stubborn or hard to find missing genes after reassembly of a genome. >>> >>> If est_forward is provided, MAKER will parse the database to look for >>> the maker_coor tags early in the pipeline. Then it will create a list of >>> locations to search, and it will search them even if there are no BLAST >>> results to seed the search (normally MAKER gets a BLAST result first and >>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>> look for a match using all of chr1 as the input to exonerate even when >>> BLAST finds nothing (this is a very very slow search, but can help pick up >>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>> gives exonerate looser matching parameters (i.e. allows for single base >>> pair introns perhaps caused by assembly errors). The logic here is that >>> given the fact that I already told MAKER that with some degree of >>> confidence I expect sequence A to map to to location X, it will try its >>> hardest to make it match. >>> >>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>> at line 1563, but only after a BLAST alignment has already seeded it to the >>> region (that BLAST result has the information in its description >>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>> In addition any BLAST seeds that overlap maker_coor will get the search >>> space for alignment polishing adjusted to match maker_coor exactly. Also >>> match parameters for exonerate will not be relaxed as they were with >>> est_forward. >>> >>> As you can see the behavior, is slightly different (because it?s an >>> accidental feature). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 6:37 AM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> That might be a useful and time saving accidental feature. But, reading >>> the code, it seems that I need to supply maker_coor but not gene_id, as >>> well as the configuration option est_forward for this to work. Any >>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>> right? >>> >>> Mikael >>> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>> mikael.durling at slu.se> wrote: >>> >>> Can this use of maker_coor be used only to hint about the placement of >>> the ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>> There is a way. It?s not a standard option and it?s undocumented, but >>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>> that. The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add >>> tags to your fasta headers, those can be used to guide the mapping and >>> naming. For example, gene_id= will ensure different isoforms >>> that share a common gene_id get clustered into the same gene, >>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using >>> blast alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, >>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>> well. Is it possible to map the names of the genes from the related species >>> to my annotation? I see the *map_forward* option, which applies to the >>> *model_gff* parameter. Is there a similar option for *est* and *protein* >>> ? >>> >>> *maker_opts.ctl* >>> >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 15:18:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 15:18:52 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Thanks. Looks interesting. Also since output is already GFF3, you could probably just use it with gff passthrough. It doesn't appear to support eukaryotes though. --Carson Sent from my iPhone > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnap to predict rRNA. > > Cheers, > Shaun > > >> On 4 March 2014 18:33, Carson Holt wrote: >> Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>>> >>>>>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>>>>> >>>>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>>>>> >>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>>>>> >>>>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Mikael Brandstr?m Durling >>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>>> To: Carson Holt >>>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>>>>> >>>>>>>> Mikael >>>>>>>> >>>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>>>> >>>>>>>>> Yes. That should work as well as an accidental feature. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>>>>>> >>>>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mikael >>>>>>>>>> >>>>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>>>>> >>>>>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>>>>> >>>>>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>>>>> >>>>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Shaun Jackman >>>>>>>>>>> Reply-To: Shaun Jackman >>>>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>>>>> To: >>>>>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>>>>> >>>>>>>>>>> maker_opts.ctl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> est=NC_123456.frn >>>>>>>>>>> protein=NC_123456.faa >>>>>>>>>>> est2genome=1 >>>>>>>>>>> protein2genome=1 >>>>>>>>>>> Thanks, >>>>>>>>>>> Shaun >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:25:21 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:25:21 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson, Torsten. It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). barrnap --help ? --kingdom [X] Kingdom: [b]acteria [a]rchaea (default 'bacteria') Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:06:31 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:06:31 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 18:19:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 18:19:43 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should be fixed in the current download? It came up on the mailing list a couple of weeks ago. I'll check. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Wednesday, May 14, 2014 at 6:06 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun http://sjackman.ca On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > >> Hi, Carson. Perhaps MAKER could integrate Barrnap >> to predict rRNA. >> >> Cheers, >> Shaun >> >> >> On 4 March 2014 18:33, Carson Holt wrote: >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were only >>> added a couple of months ago). So just like other prediction tools (snap, >>> augustus etc.), the primary focus has always been the coding genes. We?ve >>> only started adding non-coding RNA support recently for iPlant, so it?s >>> still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >>> the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set to >>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR >>> features. Ideally the feature type would be set to rRNA or tRNA as >>> appropriate, and would omit the UTR and CDS features. Is that a feature that >>> you would be interested in adding to MAKER? The rRNA gene names all start >>> with ?rrn? and the tRNA gene names with ?trn?, as is standard, so >>> determining the appropriate type should be straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>>> none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>>> Sorry, ignore my previous question. est_forward also carries forward the >>>>> names of protein evidence and works like a charm. Thank you! >>>>> >>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>>> these hits? >>>>> organism_type=prokaryotic >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> est_forward=1 >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>>> names from protein2genome? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Shaun >>>>>> >>>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>>> ) wrote: >>>>>> >>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>>> passing the gff3 to model_gff. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:22:37 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:22:37 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: I'm using MAKER 2.31.4. *http://sjackman.ca * On 14 May 2014 17:19, Carson Holt wrote: > That should be fixed in the current download? It came up on the mailing > list a couple of weeks ago. I'll check. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Wednesday, May 14, 2014 at 6:06 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I used other_gff to pass the following four-line GFF file of > Barrnap rRNA annotations through. The output of gff3_merge is quite > bizarre. See below. > > Input: > > ##gff-version 3 > 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA > > Output: > > ### > ARRAY(0x7feceb928780) > ### > ARRAY(0x7feceaa548a0) > ### > ARRAY(0x7feceeb01c60) > ### > ARRAY(0x7fecedf6fef8) > ### > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 14 May 2014 14:18, Carson Holt wrote: > >> Thanks. Looks interesting. Also since output is already GFF3, you could >> probably just use it with gff passthrough. It doesn't appear to support >> eukaryotes though. >> >> --Carson >> >> >> Sent from my iPhone >> >> On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: >> >> Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. >> >> Cheers, >> Shaun >> >> On 4 March 2014 18:33, Carson Holt wrote: >> >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were >>> only added a couple of months ago). So just like other prediction tools >>> (snap, augustus etc.), the primary focus has always been the coding genes. >>> We?ve only started adding non-coding RNA support recently for iPlant, so >>> it?s still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >>> for the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set >>> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >>> *three_prime_UTR* features. Ideally the feature type would be set to >>> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >>> features. Is that a feature that you would be interested in adding to >>> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >>> with ?trn?, as is standard, so determining the appropriate type should be >>> straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> >>>> Set single_exon=1, and the minimum size to a smaller value. I think >>>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>>> is none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward >>>> the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> >>>>> Is there a corresponding protein_forward=1 option to map forward >>>>> protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>>> wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). I >>>>> do this because if people commonly use it mixed with things like SNAP I can >>>>> start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where >>>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set >>>>> that is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little >>>>> differently. Keep in mind this was a hidden feature I used to find >>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for >>>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>>> locations to search, and it will search them even if there are no BLAST >>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>> look for a match using all of chr1 as the input to exonerate even when >>>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>>> gives exonerate looser matching parameters (i.e. allows for single base >>>>> pair introns perhaps caused by assembly errors). The logic here is that >>>>> given the fact that I already told MAKER that with some degree of >>>>> confidence I expect sequence A to map to to location X, it will try its >>>>> hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>>> region (that BLAST result has the information in its description >>>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>>> match parameters for exonerate will not be relaxed as they were with >>>>> est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an >>>>> accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, >>>>> reading the code, it seems that I need to supply maker_coor but not >>>>> gene_id, as well as the configuration option est_forward for this to work. >>>>> Any occurrences of maker_coor in GI.pm seems to be conditioned on >>>>> set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>> >>>>> Yes. That should work as well as an accidental feature. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>>> mikael.durling at slu.se> wrote: >>>>> >>>>> Can this use of maker_coor be used only to hint about the placement of >>>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>>> database of EST where I have a priori knowledge of their rough placement, >>>>> can this placement be given to maker without providing est_forward=1? >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>>> that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add >>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>> naming. For example, gene_id= will ensure different isoforms >>>>> that share a common gene_id get clustered into the same gene, >>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using >>>>> blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>>> well. Is it possible to map the names of the genes from the related species >>>>> to my annotation? I see the *map_forward* option, which applies to >>>>> the *model_gff* parameter. Is there a similar option for *est* and >>>>> *protein*? >>>>> >>>>> *maker_opts.ctl* >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> >>>>> Thanks, >>>>> Shaun >>>>> _______________________________________________ maker-devel mailing >>>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Wed May 14 17:33:55 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Thu, 15 May 2014 09:33:55 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Carson & Shaun > It doesn?t appear to support eukaryotes though. > > Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap > --help output seems to be out of date. > > Barrnap predicts the location of ribosomal RNA genes in genomes. It > supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria > (12S,16S) and eukaryotes (5S,5.8S,28S,18S). > > It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed May 14 21:23:03 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 May 2014 03:23:03 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> \o/ (now I can get rid of rnammer forever!) chris On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: Carson & Shaun It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- --Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA --Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajeet at gmail.com Thu May 15 11:36:00 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Thu, 15 May 2014 10:36:00 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: My brief test of barrnap suggests that it does not perform well on rRNA genes with introns such as those found in fungal mitochondria. Setting a lower threshold for --reject and --evalue helps, but is not enough. Looks like I cannot abandon rnammer for now. FYI - if you want to test barrnap with fungal mitochondria, use --kingdom bacteria because they have 23S and 16S unlike the human mitochondria. Sajeet On Wed, May 14, 2014 at 8:23 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > \o/ > > (now I can get rid of rnammer forever!) > > chris > > On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: > > Carson & Shaun > >> It doesn?t appear to support eukaryotes though. >> >> Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap >> --help output seems to be out of date. >> >> Barrnap predicts the location of ribosomal RNA genes in genomes. It >> supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria >> (12S,16S) and eukaryotes (5S,5.8S,28S,18S). >> >> It does support eukaryota and mitochondria, I just forgot to push the > documentation changes. This has been resolved now in the 0.4.2 release. > > --kingdom [X] Kingdom: euk arc bac mito (default 'bac') > > Next release 0.5 will have an 'accurate' mode which will fine tune the > predictions using cmalign glocal alignment. > > Thanks for your interest! > > -- > > *--Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash > University, AUSTRALIA* > > *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA > --http://www.bioinformatics.net.au/ * > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Thu May 15 13:00:47 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Thu, 15 May 2014 19:00:47 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: References: Message-ID: <1400180446764.46375@uga.edu> Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani ________________________________ From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt > Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Thu May 15 16:42:53 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Fri, 16 May 2014 08:42:53 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: Sajeet, Brief test of barrnap suggests that it does not perform well on rRNA genes > with introns such as those found in fungal mitochondria. Setting a lower > threshold for --reject and --evalue helps, but is not enough. > Looks like I cannot abandon rnammer for now. > FYI - if you want to test barrnap with fungal mitochondria, use --kingdom > bacteria because they have 23S and 16S unlike the human mitochondria. > This is good feedback. Paul Gardner also mentioned the intron issue. A "fungi" kingdom is clearly needed. I am not a mycologist so any assistance is coming up with a detailed rRNA architecture for eukaryotict phyla etc is something I have started but need assistance with. Adjustment of nhmmer alignment parameters could be done to improve the intronic rRNAs too. Here is what I have so far in terms of models: https://github.com/Victorian-Bioinformatics-Consortium/barrnap/blob/master/README.md#data-sources-for-hmm-models - do i need to split euk into protist / plant / animal / fungi? - should the current 'mito' be places inside the current 'euk' ? as mito data is likely to end up in assemblies, but keep separate for mito-only data? - plastids, chloroplasts, apicoplasts; i am not sure of the subtleties of these organelles' rRNA but am willing to learn. Thank you again for testing. Any help appreciated, -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 11:16:27 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 10:16:27 -0700 Subject: [maker-devel] Specify multiple files to rmlib Message-ID: Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:33:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:33:15 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: It could be done. I've made some changes to the subversion repository if you want to test it. You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:42:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:42:50 -0600 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1400180446764.46375@uga.edu> References: <1400180446764.46375@uga.edu> Message-ID: Upgrade to 2.31.5. Changes since 2.31.3 *a protein2genome issue that was introduced in 2.31.3 was fixed *fasta_merge failing with trnascan results issue was fixed *other_gff input resulting in ARRAY reference being printed was fixed. *naming of tRNA genes was improved to include amino acid identity --Carson From: Sivaranjani Namasivayam Date: Thursday, May 15, 2014 at 1:00 PM To: Carson Holt , Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] FW: protein2genome gene models Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 14:45:59 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 13:45:59 -0700 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? --? http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: It could be done. ?I've made some changes to the subversion repository if you want to test it. ?You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. ?It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 15:02:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 15:02:59 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: No. There are some implementation issues related to how repeats are processed and collapsed that may cause hidden bugs with the comma separated list, so it needs some more testing. The label is added to the output GFF3. For example protein=uniprot.fasta:uniprot, would cause the gff3 label to be protein2genome:uniprot rather than just protein2genome. Programs like GBrowse know how to use the labels to generate on/off check boxes to turn just some of your protein results on/off in a viewer rather than all of them. --Carson From: Shaun Jackman Date: Friday, May 16, 2014 at 2:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Specify multiple files to rmlib Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? -- http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: > It could be done. I've made some changes to the subversion repository if you > want to test it. You should also be able to use labels just as you can with > other comma separated lists in MAKER using ':' to separate the label. > > Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label > > I've also found the other_gff issue. It was fixed in the subversion > repository but not in the release package I made the other day, so I've > updated the release to 2.31.5. > > --Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 16, 2014 at 11:16 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Specify multiple files to rmlib > > Hi, Carson. Some options of maker accept multiple files as a comma separated > list, but rmlib does not. Could it? > > Thanks! > Shaun > > P.S. Any update on the fix to other_gff? > > http://sjackman.ca > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue May 20 13:17:14 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 May 2014 19:17:14 +0000 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: <520E7E32-B4E2-486F-B730-F15683679440@illinois.edu> I found a problem with some tRNAscan output using MAKER 2.31.5. I had a full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs for. This was then run as follows, with the requisite error: -system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 Nested quantifiers in regex; marked by <-- HERE in m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at /home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, <$IN> line 3067590. The problematic lines: ---------------------------------------------- -system-specific-4.1$ grep "???" Zalbi.all.gff3 KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0 KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1 KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0 KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1 ---------------------------------------------- I managed to get it going by using the following modifications (regex quotemeta) in map_gff_ids (lines 107-112): for my $id (@map_ids) { # Only if the value (or the portion preceding # the first colon) is equal to the map key. next unless ($value eq $id || $value =~ /^\Q$id\E:/); $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ /\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); } I?m guessing there may be a similar problem with map_fasta_ids? chris From carsonhh at gmail.com Tue May 20 13:43:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 13:43:48 -0600 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: Thanks. trnascan support is new enough that there are these kinds of issues that we need to find and fix. MAKER tries to use the codon name supplied by trnascan, and it looks like the codon is 'Undet_???'. I don't know why that is. We currently don't do any filtering of trnascan results (i.e. we keep everything). This might be something that we really just want to be filtering out since it doesn't have a determinable codon? At the very least I should change the codon to NNN instead of ??? to correspond to the standard ambiguity nucleotides used in FASTA format. --Carson On 5/20/14, 1:17 PM, "Fields, Christopher J" wrote: >I found a problem with some tRNAscan output using MAKER 2.31.5. I had a >full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs >for. This was then run as follows, with the requisite error: > >-system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 >Nested quantifiers in regex; marked by <-- HERE in >m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at >/home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, ><$IN> line 3067590. > >The problematic lines: > >---------------------------------------------- >-system-specific-4.1$ grep "???" Zalbi.all.gff3 >KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene >-79.0 >KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet >_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 >KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1 >KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene >-72.0 >KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet >_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 >KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1 >---------------------------------------------- > >I managed to get it going by using the following modifications (regex >quotemeta) in map_gff_ids (lines 107-112): > > for my $id (@map_ids) { > # Only if the value (or the portion preceding > # the first colon) is equal to the map key. > next unless ($value eq $id || $value =~ /^\Q$id\E:/); > $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ >/\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); > } > >I?m guessing there may be a similar problem with map_fasta_ids? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From caigh02 at gmail.com Mon May 19 21:43:18 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Mon, 19 May 2014 23:43:18 -0400 Subject: [maker-devel] Maker exon number Message-ID: Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue May 20 14:34:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 May 2014 20:34:20 +0000 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Guohong, What version of MAKER are you running? Thanks, Daniel On May 19, 2014, at 9:43 PM, Guohong Cai wrote: > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 20 14:50:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 14:50:44 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I can do that. Just a note of caution though. The ID= attribute is not protected (it's just an identifier to relate things to one another for correct parentage). Downstream scripts that use or manipulate GFF3 files can change it (so relying on it to always be the same or even be informative is not guaranteed). --Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 20 18:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 18:52:34 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I've gone ahead and made the change in the devlopment version. It will probably be convenient in most cases, but it's important to note one caveat. Exon features are shared in GFF3 format. So if there are multiple isoforms that contain the same exon, there will only be a single exon line in the GFF3, but it will list several transcript IDs in it's Parent= attribute. What does that have to do with with the ID= attribute or exon order? Well it means that ID=exon:2 in the first transcript may be the second exon, but in another transcript ID=exon:2 may be the first exon or third exon, etc. This is because there is only a single line for a given exon and it gets shared by all the transcripts. So it will always have the same ID= tag, but will hold a different position in different isoforms (so it's ordinal value will not go along with the ID in those cases). But since most gene calls from MAKER will have only one isoform (default) it could still be convenient in those cases. Thanks, Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 07:14:40 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 08:14:40 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Daniel, I am using maker-2.31.5.---Guohong On Tue, May 20, 2014 at 3:34 PM, Daniel Ence wrote: > Hi Guohong, > > What version of MAKER are you running? > > Thanks, > Daniel > > > On May 19, 2014, at 9:43 PM, Guohong Cai > wrote: > > > Hi Carson, > > > > I am using MAKER to annotate a few small genomes. When looking through > the gff file, I notice that the exon numbers do not start from 0 or 1 for > each gene. Only the first gene in a scaffold start with exon 0. If the > first gene has 3 exons (0-2), then the second gene will start from exon 3 > (an example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > > > Guohong > > > > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 08:40:47 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 09:40:47 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Thanks a lot.---Guohong On Tue, May 20, 2014 at 7:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 21:16:52 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 23:16:52 -0400 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Carson, is the development version available for download? Only maker2.31.5 is available on Yandell Lab website.---Guohong On Tue, May 20, 2014 at 8:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Thu May 22 23:13:37 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Thu, 22 May 2014 22:13:37 -0700 Subject: [maker-devel] Alternative splicing options Message-ID: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri May 23 08:55:50 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 May 2014 14:55:50 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 09:07:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 09:07:26 -0600 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto wrote: > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know the > ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. I > could not find many discussions in this group, and most genome publications > using Maker are unclear about whether they annotated alternative transcrips, > so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function not > well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged here), > what is the best approach to take, considering I already obtained what I > considered is a solid set of gene models after four rounds of annotation? > Should I start over by turning on alt_splice, and training gene predictors > from those outputs? Or would it be appropriate to simply repeat my latest > round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 09:56:27 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 08:56:27 -0700 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hey guys, Great to hear!! I will be anxious to try it out. Thanks for your prompt help! Cheers, Felipe On Fri, May 23, 2014 at 8:07 AM, Carson Holt wrote: > I'd like to add that alternate splice forms will be generated off of the > mutually exclusive EST evidence, so how well it performs as well as whether > or not it can even generates other splice forms will depend entirely on the > quality of your EST evidence. > > --Carson > > > From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options > > Hi Felipe, > > The alternative splice option is full-developed and functional option in > MAKER. What it does is tell MAKER to consider gene models with mutually > exclusive evidence. For example, if there are two models at a locus and > evidence that supports one exon in one model and a different exon in > another model, both those models might make it into the final geneset. > > From the workflow you described, I think you'd have to redo only the > fourth and final round of MAKER annotation. As a general principle for > trying out new options on your annotations, I'd recommend choosing a big > scaffold, running it with alt_splice=1, and seeing how you like the > results. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: > > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know > the ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. > I could not find many discussions in this group, and most genome > publications using Maker are unclear about whether they annotated > alternative transcrips, so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function > not well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged > here), what is the best approach to take, considering I already obtained > what I considered is a solid set of gene models after four rounds of > annotation? Should I start over by turning on alt_splice, and training > gene predictors from those outputs? Or would it be appropriate to simply > repeat my latest round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 23 10:21:38 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 23 May 2014 16:21:38 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: <14271D2B-4D83-47C9-8661-682599E94E8F@illinois.edu> That is exactly what I have seen using this option; genes with very good transcriptome evidence (as one might expect)tend to have more isoforms. The problem we run into is not having a diverse enough transcriptome set to work with (ours tend to be tissue-specific unfortunately), so we have some genes giving more isoforms than others, but we don?t design the libraries so have no control over it. We are currently only using Trinity assemblies as input over using TopHat2/Cufflinks. chris On May 23, 2014, at 10:07 AM, Carson Holt > wrote: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 14:31:36 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 13:31:36 -0700 Subject: [maker-devel] gff3_merge on models only for SNAP training? Message-ID: Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 14:33:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 14:33:17 -0600 Subject: [maker-devel] gff3_merge on models only for SNAP training? In-Reply-To: References: Message-ID: Yes. It's ok. Non-genic feature lines will be ignored. --Carson From: Felipe Barreto Date: Friday, May 23, 2014 at 2:31 PM To: MAKER group Subject: [maker-devel] gff3_merge on models only for SNAP training? Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at imbim.uu.se Fri May 23 01:56:05 2014 From: jacques.dainat at imbim.uu.se (Jacques Dainat) Date: Fri, 23 May 2014 09:56:05 +0200 Subject: [maker-devel] Possible error in tRNA annotation by maker Message-ID: Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue May 27 02:12:07 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 27 May 2014 10:12:07 +0200 Subject: [maker-devel] Some questions regarding ab-initio training Message-ID: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Hi, I wanted to get some feedback regarding the training of ab-initio gene finders - it?s not strictly Maker related, but I suppose there are many people on this list that have encountered and solved this issue in one way or another. Specifically, I am trying to train Augustus (and possibly SNAP) for a plant genome. This has always been a very frustrating process for me, but while I have a better idea now how to do it, I still don?t get the sort of accuracy that I am hoping for. A quick run-through of my process; Evidence build with maker on level 1 and 2 proteins from Uniprot + Sanger-sequenced EST data Filtered for Models with an AED <= 0.3 Loaded that into WebApollo, together with an existing reference annotation and the evidence tracks Manually curated/selected 750 gene models using the following rules: - Must have start/stop codon - Most have as many exons as possible - Must agree with evidence - Must be >= 2kb part from other gene models (provided as flanking regions for augustus to train intergenic sequence) From these models, I created a GBK file, split it into 650 (train) and 100 (test) models and created a new profile using the documented procedure. But: While the naked ab-init models created through maker get a lot of genes ?sort of right?, I still see too many issues to be really satisfied. Problems include: - random exon calls which are not supported by any line of evidence (~1 per gene model, I would guess) - poor congruency with some gene models (especially ones not used for training/testing) Is there any best-practice guide on how to improve this? The Augustus website is unfortunately quite poor on detail? My impression so far is that ramping up the number of training models isn?t really doing too much beyond a certain point (tried 400, 500 and 750). Regards, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue May 27 09:25:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:25:39 -0600 Subject: [maker-devel] Some questions regarding ab-initio training In-Reply-To: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> References: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Message-ID: Extra exons can be required for predictors to make sense of a region (they do the best they can). This can be due to imperfect assemblies or repeats. For plants the repeat database is the the one thing that will most affect the annotation quality. You may need to spend some time building the best repeat library you can. The repeat library is the next most important thing next to training the predictor, because they confuse the predictor (sometimes a lot) causing it to behave oddly in those regions (because repeats do encode real protein and protein fragments). Also when running now with MAKER make sure to include the entire proteome of a related species and not just UniProt, and you will get better performance. Now that you have Augustus trained, using it inside of MAKER with an improved repeat library and additional protein evidence should give it the feedback that will allow it to perform better than it would with just naked ab initio prediction. Thanks, Carson On 5/27/14, 2:12 AM, "Marc H?ppner" wrote: >Hi, > >I wanted to get some feedback regarding the training of ab-initio gene >finders - it?s not strictly Maker related, but I suppose there are many >people on this list that have encountered and solved this issue in one >way or another. > >Specifically, I am trying to train Augustus (and possibly SNAP) for a >plant genome. This has always been a very frustrating process for me, but >while I have a better idea now how to do it, I still don?t get the sort >of accuracy that I am hoping for. A quick run-through of my process; > >Evidence build with maker on level 1 and 2 proteins from Uniprot + >Sanger-sequenced EST data > >Filtered for Models with an AED <= 0.3 > >Loaded that into WebApollo, together with an existing reference >annotation and the evidence tracks > >Manually curated/selected 750 gene models using the following rules: >- Must have start/stop codon >- Most have as many exons as possible >- Must agree with evidence >- Must be >= 2kb part from other gene models (provided as flanking >regions for augustus to train intergenic sequence) > >From these models, I created a GBK file, split it into 650 (train) and >100 (test) models and created a new profile using the documented >procedure. > >But: > >While the naked ab-init models created through maker get a lot of genes >?sort of right?, I still see too many issues to be really satisfied. >Problems include: > >- random exon calls which are not supported by any line of evidence (~1 >per gene model, I would guess) >- poor congruency with some gene models (especially ones not used for >training/testing) > >Is there any best-practice guide on how to improve this? The Augustus >website is unfortunately quite poor on detail? My impression so far is >that ramping up the number of training models isn?t really doing too much >beyond a certain point (tried 400, 500 and 750). > >Regards, > >Marc > > >Marc P. Hoeppner, PhD >Team Leader >BILS Genome Annotation Platform >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 27 09:26:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:26:25 -0600 Subject: [maker-devel] Possible error in tRNA annotation by maker In-Reply-To: References: Message-ID: Do you have a small test contig I could use to duplicate the error? That will make it easier to fix. Thanks, Carson From: Jacques Dainat Date: Friday, May 23, 2014 at 1:56 AM To: Subject: [maker-devel] Possible error in tRNA annotation by maker Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 01:28:14 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 09:28:14 +0200 Subject: [maker-devel] Problem with installation Message-ID: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 02:26:08 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 10:26:08 +0200 Subject: [maker-devel] General question Message-ID: I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.cantarel at gmail.com Wed May 28 05:11:18 2014 From: b.cantarel at gmail.com (Brandi Cantarel) Date: Wed, 28 May 2014 06:11:18 -0500 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Maker's predictions are improved with evidence. These proteins can be from uniprot (I recommend uniprot50) or from a closely related taxa. Maker uses comparisons to these proteins in its prediction. There is more detail on this in the paper. Sent from my iPhone > On May 28, 2014, at 3:26, Panos Ioannidis wrote: > > I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 05:29:43 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 13:29:43 +0200 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Thanks Brandi. On Wed, May 28, 2014 at 1:11 PM, Brandi Cantarel wrote: > Maker's predictions are improved with evidence. These proteins can be > from uniprot (I recommend uniprot50) or from a closely related taxa. > > Maker uses comparisons to these proteins in its prediction. There is more > detail on this in the paper. > > Sent from my iPhone > > On May 28, 2014, at 3:26, Panos Ioannidis > wrote: > > I'm going through the Maker tutorial and saw that among the input files > you give it, there's a fasta file with proteins (the protein=xxxparameter in the > maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. > proteins) and also annotates them. Does it only do annotation of already > predicted genes/proteins? But then, why is it using gene predictors like > Augustus, SNAP, etc? > > Thanks, > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 07:29:58 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 13:29:58 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 07:46:12 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 15:46:12 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence wrote: > Hi Panos, When you go to the src directory and type "./Build status", > what message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or > error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 08:03:33 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 14:03:33 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, So I just tried the commands that you used on my install of maker, and it took a surprisingly long time for the error messages to print. The test that we use in the tutorials (it seems to run faster than running maker with -h or with no options) is maker -CTL, which will create control files that you use to set the many options for maker. Try running ./maker -CTL and let me know whether it creates those files. I guess that it might take more or less time, depending on your machine. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 7:46 AM, Panos Ioannidis > wrote: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence > wrote: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 08:32:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 08:32:07 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis Cc: "" Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis wrote: > Hello Maker community, > > I just finished installing Maker and even though everything seems to be okay, > when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:13:05 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:13:05 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one > thing that happens when you first use a new module or script Is that the > interpreter follows the dependency tree validating that everything > executes/loads correctly. Since you installed a number of dependencies and > MAKER itself, the first time you launch MAKER Perl has to do this check on > the dependency tree. This only happens the first time, and after that Perl > remembers it already ran the check so the dependencies and MAKER will just > start from then on. Normally this proccess takes less than 30 seconds; > however, on some systems (especially clusters) there may a heavy IO burden > and this process can take a while. For example does it take a moment for > 'ls -al' to return in some directories rather than returning > instantaneously like it is supposed to? If it takes 3 seconds to return or > example, then each dependency check may take up to 3 seconds. If you just > installed a bunch of new perl modules then there may be a hundred or more > dependencies that may have to be validated for the first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:15:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:15:20 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:16:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:16:58 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: You may also want to look into if you need to reinstall perl on another drive. --Carson From: Carson Holt Date: Wednesday, May 28, 2014 at 10:15 AM To: Panos Ioannidis Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:25:04 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:25:04 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: "ls -al" is instantaneous in all directories... I'll try installing it on my workstation, although it's not possible to do annotation on my machine! And the machine I currently have installed it, is our server and I can't really make any big changes there. Anyway, I'll let you know how it goes. P On Wed, May 28, 2014 at 6:16 PM, Carson Holt wrote: > You may also want to look into if you need to reinstall perl on another > drive. > > --Carson > > > From: Carson Holt > Date: Wednesday, May 28, 2014 at 10:15 AM > To: Panos Ioannidis > > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 > seconds per query which is why you should do the 'ls -al' test), it can > take several minutes because it's an IO issue. > > --Carson > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:28:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:28:30 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Wed May 28 11:39:45 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 10:39:45 -0700 Subject: [maker-devel] Adding non-overlapping models to final set Message-ID: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 12:35:06 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 18:35:06 +0000 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: Message-ID: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 12:45:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 12:45:05 -0600 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: For convenience you can use the attached script to help pull out the match/match_part features you want from the GFF3 file (or you can pull them out yourself). Then do just like Daniel said by setting keep_preds=1 and giving the selected match/match_part features to pred_gf, and your current MAKER models to model_gff. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 12:35 PM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Adding non-overlapping models to final set Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto wrote: > Hi, all, > > I finished generating Maker gene models. Following suggestions here and from > publications, I used IPRscan on the set of non-ovelapping ab initio protein > models. This identified ~200 models with protein domains, and I would like to > add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to update > my final maker_genome.gff file to include these 200 models, without also > adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the 200 > new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional info > into the gff file. I want simply to generate an updated (and final) gene set > and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3237 bytes Desc: not available URL: From fbarreto at ucsd.edu Wed May 28 14:28:48 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 13:28:48 -0700 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: Awesome! Thanks for the tips and script. This should do the trick. Will come back if I get stuck. Felipe On Wed, May 28, 2014 at 11:45 AM, Carson Holt wrote: > For convenience you can use the attached script to help pull out the > match/match_part features you want from the GFF3 file (or you can pull them > out yourself). Then do just like Daniel said by setting keep_preds=1 and > giving the selected match/match_part features to pred_gf, and your current > MAKER models to model_gff. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 12:35 PM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Adding non-overlapping models to final set > > Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If > you don't care about getting the functional information from the IPRscan > report and into the gff file, then you just need to pull those predictions > out from all the ab-initio predictions that you don't care about and put > them in a fasta file. Then you put that file in for the "pred_gff" option > and set keep_preds=1. That will promote those predictions to full gene > models. Then you can merge with your other gff3 file. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: > > Hi, all, > > I finished generating Maker gene models. Following suggestions here and > from publications, I used IPRscan on the set of non-ovelapping ab initio > protein models. This identified ~200 models with protein domains, and I > would like to add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to > update my final maker_genome.gff file to include these 200 models, without > also adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the > 200 new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional > info into the gff file. I want simply to generate an updated (and final) > gene set and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Thu May 29 03:21:24 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Thu, 29 May 2014 11:21:24 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt wrote: > Try perlbrew to set up yor own local version of perl just for your user. > http://perlbrew.pl > > --Carson > > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 08:58:22 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 14:58:22 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, The results are stored in the datastore directory in the "maker.output" directory. You can merge those results into one gff file with the gff3_merge accessory script. It's included in the bin directory. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 3:21 AM, Panos Ioannidis > wrote: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt > wrote: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt > Cc: Daniel Ence >, ">" > Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt > wrote: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis > Cc: ">" > Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Thu May 29 13:15:39 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Thu, 29 May 2014 15:15:39 -0400 Subject: [maker-devel] maker gene order in gff output Message-ID: Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:37:24 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:37:24 -0400 Subject: [maker-devel] Question about 'keep_pred' setting Message-ID: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 14:44:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 20:44:28 +0000 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: Message-ID: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Hi Daniel, Your interpretation of the code is correct. keep_preds is a binary setting. There's been some discussion behind-the-scenes about making it more flexible, but that hasn't been implemented yet. We need to fix what it says in the control file. Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:47:47 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:47:47 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary > setting. There's been some discussion behind-the-scenes about making it > more flexible, but that hasn't been implemented yet. We need to fix what it > says in the control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > > Good afternoon! > > I have a quick question about the keep_pred setting in Maker. In older > versions of Maker, this was a binary value indicating whether unsupported > predictions should be kept. I'm now using Maker 2.31.3, where it's > described as a scaled value indicating a "concordance threshold" for > unsupported predictions. As far as I can tell from the code, however, it's > still treated in the same way as before. > > Could you briefly describe the motivation for this setting and the > intended (although possibly incomplete) change in its functionality in new > versions of Maker? > > Thanks! > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 15:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 15:43:35 -0600 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: There is a hidden score called abAED that measures concordance among the ab initio gene predictors . The idea was to have ab initio models that are the same across multiple ab initio predictor be kept if they're group concordance is high enough, then drop ab initio predictions that only happen in one ab initio predictor. Currently the option is all or nothing, the threshold would give a more fine grained control of keeping just some unsupported predictions. --Carson From: Daniel Standage Date: Thursday, May 29, 2014 at 2:47 PM To: Daniel Ence Cc: Maker Mailing List Subject: Re: [maker-devel] Question about 'keep_pred' setting Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary setting. > There's been some discussion behind-the-scenes about making it more flexible, > but that hasn't been implemented yet. We need to fix what it says in the > control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's described >> as a scaled value indicating a "concordance threshold" for unsupported >> predictions. As far as I can tell from the code, however, it's still treated >> in the same way as before. >> >> Could you briefly describe the motivation for this setting and the intended >> (although possibly incomplete) change in its functionality in new versions of >> Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 16:29:39 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 18:29:39 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Ah, that makes sense. Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 5:43 PM, Carson Holt wrote: > There is a hidden score called abAED that measures concordance among the > ab initio gene predictors . The idea was to have ab initio models that are > the same across multiple ab initio predictor be kept if they're group > concordance is high enough, then drop ab initio predictions that only > happen in one ab initio predictor. Currently the option is all or nothing, > the threshold would give a more fine grained control of keeping just some > unsupported predictions. > > --Carson > > > From: Daniel Standage > Date: Thursday, May 29, 2014 at 2:47 PM > To: Daniel Ence > Cc: Maker Mailing List > Subject: Re: [maker-devel] Question about 'keep_pred' setting > > Thanks. > > Just curious: how would the intended behavior differ if keep_pred was set > to, say, 0.5, instead of 0 or 1? > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, May 29, 2014 at 4:44 PM, Daniel Ence > wrote: > >> Hi Daniel, >> >> Your interpretation of the code is correct. keep_preds is a binary >> setting. There's been some discussion behind-the-scenes about making it >> more flexible, but that hasn't been implemented yet. We need to fix what it >> says in the control file. >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 29, 2014, at 2:37 PM, Daniel Standage >> wrote: >> >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's >> described as a scaled value indicating a "concordance threshold" for >> unsupported predictions. As far as I can tell from the code, however, it's >> still treated in the same way as before. >> >> Could you briefly describe the motivation for this setting and the >> intended (although possibly incomplete) change in its functionality in new >> versions of Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 21:11:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 21:11:11 -0600 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: The maker_map_ids script that comes with MAKER can be used to generate new names of the style PREFIX###### or PREFIX_######. You can use the --sort_order flag to sort the contigs in whatever your preferred order is before generating the new names. Then use the map_gff_ids and map_fasta_ids to change the names in the gff3 and fasta files respectively. Here is some extra information from a tutorial where the maker_map_ids script is used --> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_G MOD_Online_Training_2014#Post_Processing_of_Annotations --Carson From: Guohong Cai Date: Thursday, May 29, 2014 at 1:15 PM To: "" Subject: [maker-devel] maker gene order in gff output Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Fri May 30 05:40:17 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Fri, 30 May 2014 06:40:17 -0500 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: Great????Guohong On Thu, May 29, 2014 at 10:11 PM, Carson Holt wrote: > The maker_map_ids script that comes with MAKER can be used to generate new > names of the style PREFIX###### or PREFIX_######. You can use > the --sort_order flag to sort the contigs in whatever your preferred order > is before generating the new names. > > Then use the map_gff_ids and map_fasta_ids to change the names in the > gff3 and fasta files respectively. > > Here is some extra information from a tutorial where the maker_map_ids > script is used --> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations > > --Carson > > > From: Guohong Cai > Date: Thursday, May 29, 2014 at 1:15 PM > To: "" > Subject: [maker-devel] maker gene order in gff output > > Hi Carson, > > In the maker output, the genes have names like "genemark-scaffold17- > processed-gene-0.0". Many users probably will eventually give the genes > different names, such as GSGxxx (Genus Species Gene #). > > In the gff output, the scaffolds are not in order (either numerical order > or the order of input assembly). On the same scaffold, the genes are not > listed in order either. This will make it a little harder for users to > change the gene IDs. We may name the genes in order from scaffold 1 to > scaffold N, and and each scaffold, order the genes from left to right, e.g > GSG00001, GSG00002). Do you think you can order the genes in the gff > output? For example, order the scaffolds according to the input genome > assembly, and on each scaffold, order the genes from 5' to 3'. > > Thanks. > > Guohong > Rutgers University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Sat May 31 09:23:23 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Sat, 31 May 2014 11:23:23 -0400 Subject: [maker-devel] Precomputed alignments Message-ID: Hello again! About a year ago I asked about using precomputed alignments with Maker. The thread quickly took a different direction as we tried to track down other issues, and I never got the thread back on its original track. So, to return to the original question, what exactly is required when providing pre-computed alignments in GFF3 format? For example, does it affect Maker's behavior whether a score is given? The "Target" attribute? The "Gap" attribute? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdelmore at zoology.ubc.ca Thu May 1 09:06:27 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Thu, 1 May 2014 08:06:27 -0700 Subject: [maker-devel] problem with dsindex Message-ID: Hi Carson, I wanted to confirm that the interproscan scripts provided in maker are now compatible with version 5 of the program and ask if there was any additional documentation for the use of iprscan_wrap. It looks like that script will run interproscan for us but I'm not sure what to supply on the command line. I could also run interproscan directory but am wondering if you have any suggestions for what to include on the command line, as this has changed in the new version. This is what I would propose: ./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup Thanks, Kira From carsonhh at gmail.com Fri May 2 12:18:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:18:04 -0600 Subject: [maker-devel] problem with dsindex In-Reply-To: References: Message-ID: The scripts that use interproscan output should work with version 5 (iprscan2gff3, ipr_update_gff, etc.). But scripts that wrap interproscan and run it for you like iprscan_wrap only work with version 4. Thanks, Carson On 5/1/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Hi Carson, > >I wanted to confirm that the interproscan scripts provided in maker are >now compatible with version 5 of the program and ask if there was any >additional documentation for the use of iprscan_wrap. It looks like that >script will run interproscan for us but I'm not sure what to supply on the >command line. > >I could also run interproscan directory but am wondering if you have any >suggestions for what to include on the command line, as this has changed >in the new version. This is what I would propose: > >./interproscan.sh -i test_proteins.fasta -f gff3 -goterms -iprlookup > >Thanks, >Kira > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Fri May 2 12:55:27 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 12:55:27 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Whichever has the best AED score I believe, but you can add gene_id= to the header of each fasta file to ensure MAKER doesn't try and cluster unrelated transcripts into a single gene. Then the transcript name and gene name will be guaranteed to match up. --Carson From: Shaun Jackman Date: Wednesday, April 30, 2014 at 5:25 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] est_forward and conflicting names Hi, Carson. I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m using with est and protein to annotate a plant mitochondrion. Most of these reference sequences have sensible and consistent gene names, and so I?m using est_forward to retain the gene names. This workflow is working well for me. Some of the genes pulled in from GenBank have less useful names like orf1234 or other numeric IDs. When multiple evidence sequences map to the same location, how does est_forward choose which name to use? If it?s chosen arbitrarily, could it be possible to choose the most common name instead? Thanks, Shaun _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 13:40:42 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 12:40:42 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? The integration of MAKER-P with tRNAscan is very useful. The identified genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are conventionally named according to the amino acid and anticodon, such as trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names with that convention? On 6 March 2014 12:58, Carson Holt wrote: Yes. I?ll fix the naming. > > Thanks, > Carson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 13:50:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 13:50:23 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should already be fixed in the current 2.31.3 download. I'll also send you the subversion credentials in a separate e-mail. Thanks, Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 1:40 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. Do you happen to have a patch that I could test out that fixes the naming of the tRNA identified by tRNAscan? Is the MAKER subversion repository public, and if so, what?s its URL? Cheers, Shaun Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified genes > are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names > with that convention? On 6 March 2014 12:58, Carson Holt wrote: > Yes. I?ll fix the naming. > > Thanks, > Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 2 14:00:22 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 2 May 2014 13:00:22 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also > send you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes > the naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? > > The integration of MAKER-P with tRNAscan is very useful. The identified > genes are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are > conventionally named according to the amino acid and anticodon, such as > trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the > names with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > > Yes. I?ll fix the naming. >> >> Thanks, >> Carson >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 2 14:14:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 02 May 2014 14:14:11 -0600 Subject: [maker-devel] Mapping gene names Message-ID: I need to fix that last update tag. I did a point release, because there were a couple of very minor fixes that didn't justify a full release (tRNA naming and a fasta_merge bug for tRNAs - I think three lines total of code). There won't be another major version release for a while because we're working on MAKER-EVM which will be version 3.0 (joint project for full MAKER integration with EVM). So just point releases on 2.31 (which will be the very last version of MAKER2). I'll fix the static link and add an new one for 2.31.3. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 2, 2014 at 2:00 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Fantastic. Thanks, Carson. I didn?t realize that there was a point release of MAKER. It?s not announced on the MAKER home page, which still reports Last Software Update v2.31 (Feb 11, 2014). Where are point releases announced? The static link for MAKER 2.31 reports 403 Forbidden. Is there a new static link for MAKER 2.31.3? Cheers, Shaun On 2 May 2014 12:50, Carson Holt wrote: > That should already be fixed in the current 2.31.3 download. I'll also send > you the subversion credentials in a separate e-mail. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 2, 2014 at 1:40 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. Do you happen to have a patch that I could test out that fixes the > naming of the tRNA identified by tRNAscan? > > Is the MAKER subversion repository public, and if so, what?s its URL? > > Cheers, > Shaun > > Shaun wrote? >> >> The integration of MAKER-P with tRNAscan is very useful. The identified genes >> are named e.g. trnascan-205522-processed-gene-0.38. tRNA genes are >> conventionally named according to the amino acid and anticodon, such as >> trnW-CCA. Would it be possible for MAKER to name or perhaps prefix the names >> with that convention? > > On 6 March 2014 12:58, Carson Holt wrote: > >> Yes. I?ll fix the naming. >> >> Thanks, >> Carson -------------- next part -------------- An HTML attachment was scrubbed... URL: From cynsb1987 at gmail.com Sun May 4 19:58:33 2014 From: cynsb1987 at gmail.com (hueytyng) Date: Mon, 5 May 2014 11:58:33 +1000 Subject: [maker-devel] Non-unique top level ID Message-ID: Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 08:18:18 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 08:18:18 -0600 Subject: [maker-devel] Non-unique top level ID In-Reply-To: References: Message-ID: If you use gff3_merge with the -l flag, then it will check for non-unique ID's and give new IDs to make them unique. Also in general it is better just to use the cufflinks results and exclude tophat results as they tend to be very noisy and decrease the quality of the final models overall. Thanks, Carson From: hueytyng Date: Sunday, May 4, 2014 at 7:58 PM To: Subject: [maker-devel] Non-unique top level ID Hi Carson, I ran MAKER using RNAseq as evidence (tophat+cufflinks). The gff file is provided to Maker under "est_gff". Maker runs fine but there are a few failed contigs, and these error messages in my log: ERROR: Non-unique top level ID for 1:JUNC00010801:0 While this is technically legal in GFF3, it usually indicates a poorly fomatted GFF3 file (perhaps you tried to merge two GFF3 files without accounting for unique IDs). MAKER will not handle these correctly. --> rank=2, hostname=safs-raijen ERROR: Failed while prepare section files ERROR: Chunk failed at level:12, tier_type:3 FAILED CONTIG:scaffold11129|size28423 I do see multiple IDs in my gff. I have 9 RNAseq samples, is the way I merged them causing the error? This is what I've done to prepare the gff: 1. merge cuffmerge output cuffmerge -o -p 4 assembly_list.txt cufflinks2gff3 merged.gtf > merged.gff 2. merge junctions find -name "junctions.bed" -exec cat {} \; >> all_junctions.bed tophat2gff3 all_junctions.bed > all_junctions.gff 3. combine cuffmerge and junctions gff3_merge -o tophatandcufflinks.gff merged.gff all_junctions.gff 4. provide in opts file est_gff=tophatandcufflinks.gff #EST evidence from an external gff3 file Thank you Jenny _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 10:48:30 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 11:48:30 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue Message-ID: Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 10:53:09 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 10:53:09 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of MAKER (2.31.3 is current). ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ --Carson From: Dave Messina Date: Monday, May 5, 2014 at 10:48 AM To: Subject: [maker-devel] MAKER / RepeatRunner configuration issue Hi, Even with the sample data, I'm getting a "Sequence contains no data" error from blastx during the RepeatRunner phase. I've uploaded a tarball with my run on the dpp sample data to the MAKER File Upload site (filename maker_test.tgz). Could you please take a look and give me your thoughts? Thanks! Dave _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 12:05:54 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 13:05:54 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old > version of MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER > File Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:32:01 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:32:01 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: I can't reproduce your issue, so it is probably something about your system or environment. 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. 2. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. 3. Is your current working directory full. 4. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. 5. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). 6. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 13:44:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 13:44:11 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Could you give me the full output of this command --> df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A I'm really mostly interested in the mount information. Some non-traditional network storage implementations can induce odd behaviors (for example by not supporting operations like hard links, etc.). --Carson From: Dave Messina Date: Monday, May 5, 2014 at 12:05 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Thanks for your quick reply, Carson. I'm using BLAST+ version 2.2.28, and even after upgrading from MAKER 2.31 to 2.31.3, unfortunately I'm still seeing the same issue. I've uploaded a new tarball containing the latest (failed) output on the dpp sample data. Any thoughts you have on how to resolve this would be great. Thanks! Dave On Mon, May 5, 2014 at 11:53 AM, Carson Holt wrote: > Use BLAST+ version 2.2.28. Also Make sure you are not using an old version of > MAKER (2.31.3 is current). > > ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/ > > --Carson > > > From: Dave Messina > Date: Monday, May 5, 2014 at 10:48 AM > To: > Subject: [maker-devel] MAKER / RepeatRunner configuration issue > > Hi, > > Even with the sample data, I'm getting a "Sequence contains no data" error > from blastx during the RepeatRunner phase. > > I've uploaded a tarball with my run on the dpp sample data to the MAKER File > Upload site (filename maker_test.tgz). > > Could you please take a look and give me your thoughts? > > > Thanks! > Dave > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From online at davemessina.com Mon May 5 13:53:58 2014 From: online at davemessina.com (Dave Messina) Date: Mon, 5 May 2014 14:53:58 -0500 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A > Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Is you /tmp directory full (or whatever you have $TMPDIR > environmental variable is set to). Use 'df -h /tmp' to check. > > $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > > 1. Are you running in a directory on an NFS drive? Is it true NFS or > is it something like FUSE. > > Same error on true NFS or on local disk. > > 1. Is your current working directory full. > > No. > > 1. Are you setting TMP= in the control files to either an NFS mounted > location or an in memory mounted location. Same issue if you are setting > the system's TMPDIR environmental variable to one of these. > > I tried setting it to /tmp just to be sure (no difference). > > 1. Is your default /tmp directory in fact locally mounted (some > clusters set this to in memory scratch). > > Yes. > > 1. Even though you already checked, humor me and run this exact > command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx > -version > > $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 14:00:57 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 14:00:57 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Mon May 5 16:34:14 2014 From: carsonhh at gmail.com (Carson Holt) Date: Mon, 05 May 2014 16:34:14 -0600 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: Message-ID: After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt Cc: Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt wrote: > df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental > variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / > 1. Are you running in a directory on an NFS drive? Is it true NFS or is it > something like FUSE. Same error on true NFS or on local disk. > 1. Is your current working directory full. No. > 1. Are you setting TMP= in the control files to either an NFS mounted location > or an in memory mounted location. Same issue if you are setting the system's > TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). > 1. Is your default /tmp directory in fact locally mounted (some clusters set > this to in memory scratch). Yes. > 1. Even though you already checked, humor me and run this exact command --> > /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Mon May 5 18:09:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Mon, 5 May 2014 17:09:41 -0700 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Message-ID: Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no -------------- next part -------------- An HTML attachment was scrubbed... URL: From myandell at genetics.utah.edu Mon May 5 23:06:25 2014 From: myandell at genetics.utah.edu (Mark Yandell) Date: Tue, 6 May 2014 05:06:25 +0000 Subject: [maker-devel] MAKER / RepeatRunner configuration issue In-Reply-To: References: , Message-ID: <7A60AB257EFF2B48B1F4C814817EA05365FB90A5@mxb2.hg.genetics.utah.edu> you are the Man, Carson. --mark ________________________________________ From: maker-devel [maker-devel-bounces at yandell-lab.org] on behalf of Carson Holt [carsonhh at gmail.com] Sent: Monday, May 05, 2014 4:34 PM To: Dave Messina Cc: maker-devel at yandell-lab.org Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue After logging in I found the issue. You have a broken BioPerl build. Specifically Bio::DB::Fasta. Quite some time ago, there was a download direct from the BioPerl website that was broken and I think you may have that broken version. Just update to the current CPAN version. I was able to run fine when I forced MAKER to use a path I made for the the newer version of BioPerl. You can delete my credentials now. Thanks, Carson From: Carson Holt > Date: Monday, May 5, 2014 at 2:00 PM To: Dave Messina > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue This is one of those things that I would have to have access to your system since I can't duplicate it and it is only happening to you. If you can swing a temporary ssh account, I can look at it. But it's really just a shot in the dark otherwise. --Carson From: Dave Messina > Date: Monday, May 5, 2014 at 1:53 PM To: Carson Holt > Cc: > Subject: Re: [maker-devel] MAKER / RepeatRunner configuration issue Hi Carson, On Mon, May 5, 2014 at 2:44 PM, Carson Holt > wrote: df -h /Volumes/Qnap/projects/projectAnwar_SNGN0016AA-A Filesystem Type Size Used Avail Use% Mounted on 10.0.1.128:/projects nfs 13T 9.6T 3.1T 76% /Volumes/Qnap That one is on NFS, although the second tarball I uploaded was done in the /tmp dir, and that's on a local disk: Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Is you /tmp directory full (or whatever you have $TMPDIR environmental variable is set to). Use 'df -h /tmp' to check. $ df -h /tmp Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/vg_d5-lv_root ext4 50G 8.9G 38G 19% / 1. Are you running in a directory on an NFS drive? Is it true NFS or is it something like FUSE. Same error on true NFS or on local disk. 1. Is your current working directory full. No. 1. Are you setting TMP= in the control files to either an NFS mounted location or an in memory mounted location. Same issue if you are setting the system's TMPDIR environmental variable to one of these. I tried setting it to /tmp just to be sure (no difference). 1. Is your default /tmp directory in fact locally mounted (some clusters set this to in memory scratch). Yes. 1. Even though you already checked, humor me and run this exact command --> /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version $ /Volumes/Qnap/external/Linux_x86_64/ncbi-blast/bin/blastx -version blastx: 2.2.28+ Package: blast 2.2.28, build Mar 12 2013 16:52:31 Thanks so much for your help. Best, Dave From kdelmore at zoology.ubc.ca Mon May 5 22:36:41 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Mon, 5 May 2014 21:36:41 -0700 Subject: [maker-devel] iprscan and ipr_update_gff Message-ID: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Hi, I have a question about the interproscan scripts available with maker. I'm following the recommendations posted by Carson in Aug 2011 to incorporate results from iprscan. I'm getting quite a few warning messages with ipr_update_gff; they're all the same and suggest that there's no value for $name. When I look through the updated gff, however, the dbxrefs have been added. Is this something I should be worried about? I'm using iprscan version 5 and actually get some warning messages there as well but again, the output looks alright. In addition, some of my fastas don't get these warnings in iprscan and they still give me the error with ipr_update_gff so I don't think that's the problem. I'm using proteins from UniProt. My commands and errors are below. I've also attached the first 20000 lines from my initial gff and raw file from iprscan. Thanks, I really appreciate your continued support. Kira ### commands for interproscan scripts available in maker iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace error after last step (just an example, a ton of similar lines): Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. Use of uninitialized value $name in hash element at /home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. ### commands for interproscan 5 interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > interpro_6.out 2>&1 interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml error after first step: 04/05/2014 19:22:09:269 25% completed 04/05/2014 21:27:36:305 50% completed 04/05/2014 21:32:34:236 75% completed 04/05/2014 21:38:01:379 90% completed 2014-05-04 21:50:22,761 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959/jobPIRSF-2.84 2014-05-04 21:50:22,908 [uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:253] WARN - At run completion, unable to delete temporary directory /lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174837921_l959 04/05/2014 21:50:23:380 100% done: InterProScan analyses completed error after second step: interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml 05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... 2014-05-05 21:04:00,603 [uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run completion, unable to delete temporary directory /home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_210353293_gsjh -------------- next part -------------- A non-text attachment was scrubbed... Name: 6.maker.proteins.fasta.xml.raw Type: application/octet-stream Size: 1098375 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 6_first20000.gff Type: application/octet-stream Size: 2880873 bytes Desc: not available URL: From carsonhh at gmail.com Tue May 6 08:31:55 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:31:55 -0600 Subject: [maker-devel] Fewer genes in MAKER 2.31.3 In-Reply-To: References: Message-ID: Nothing in the scoring or gene selection has changed. Changes are: Fix trnascan naming so codon is included in name Fix fgenesh parsing when used with correct_est_fusion Fix final ID bug when '/' character used in GFF3 input ID. Fix a start codon issue that could come up under when the right set of parameters were used (primarily correct_est_fusion and protein2genome). If you can provide both gff3 outputs form comparison, I could probably tell you why. Set up both runs to make sure that settings are indeed identical. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Monday, May 5, 2014 at 6:09 PM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Fewer genes in MAKER 2.31.3 Hi, Carson. I?m annotating a 6 Mbp plant mitochondrial genome using GenBank coding nucleotide and protein sequences from related species. I?m seeing 50 genes annotated using MAKER 2.31, and 37 genes annotated using MAKER 2.31.3. The missing genes look good based on the evidence. I see protein_match evidence in the 2.31.3 GFF file, but no resulting gene and mRNA. Is there a ChangeLog indicating the changes from 2.31 to 2.31.3? Do you know of a change that might cause this? What information can I give you that would help debug this? My maker_opts.ctl file follows. Cheers, Shaun #-----Genome (these are always required) genome=pg29mt-concat.fa #genome sequence (fasta file or fasta embeded in GFF3 file) organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic #-----EST Evidence (for best results provide a file for at least one) est=cds_na.fa #set of ESTs or assembled mRNA-seq in fasta format #-----Protein Homology Evidence (for best results provide a file for at least one) protein=cds_aa.fa #protein sequence file in fasta format (i.e. from mutiple oransisms) #-----Repeat Masking (leave values blank to skip repeat masking) model_org=picea #select a model organism for RepBase masking in RepeatMasker rmlib=rmlib.fa #provide an organism specific repeat library in fasta format for RepeatMasker repeat_protein=/usr/local/opt/maker/libexec/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner #-----Gene Prediction est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no trna=1 #find tRNAs with tRNAscan, 1 = yes, 0 = no #-----External Application Behavior Options cpus=4 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI) #-----MAKER Behavior Options est_forward=1 #map names and attributes forward from EST evidence, 1 = yes, 0 = no single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 6 08:57:04 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 08:57:04 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: You have entries in your interproscan output that aren't in your GFF3. Is your GFF3 file truncated? --Carson On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" wrote: >Hi, I have a question about the interproscan scripts available with maker. > >I'm following the recommendations posted by Carson in Aug 2011 to >incorporate results from iprscan. I'm getting quite a few warning messages >with ipr_update_gff; they're all the same and suggest that there's no >value for $name. When I look through the updated gff, however, the dbxrefs >have been added. Is this something I should be worried about? > >I'm using iprscan version 5 and actually get some warning messages there >as well but again, the output looks alright. In addition, some of my >fastas don't get these warnings in iprscan and they still give me the >error with ipr_update_gff so I don't think that's the problem. I'm using >proteins from UniProt. My commands and errors are below. I've also >attached the first 20000 lines from my initial gff and raw file from >iprscan. > >Thanks, I really appreciate your continued support. >Kira > >### > >commands for interproscan scripts available in maker >iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw -inplace > >error after last step (just an example, a ton of similar lines): >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >Use of uninitialized value $name in hash element at >/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. > > >### > >commands for interproscan 5 >interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >interpro_6.out 2>&1 >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml > >error after first step: >04/05/2014 19:22:09:269 25% completed >04/05/2014 21:27:36:305 50% completed >04/05/2014 21:32:34:236 75% completed >04/05/2014 21:38:01:379 90% completed >2014-05-04 21:50:22,761 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >248] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959/jobPIRSF-2.84 >2014-05-04 21:50:22,908 >[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >253] >WARN - At run completion, unable to delete temporary directory >/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >7921_l959 >04/05/2014 21:50:23:380 100% done: InterProScan analyses completed > >error after second step: >interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >2014-05-05 21:04:00,603 >[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >completion, unable to delete temporary directory >/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >10353293_gsjh_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From kdelmore at zoology.ubc.ca Tue May 6 09:06:56 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:06:56 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> Message-ID: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Thanks for your reply. I have not truncated the gff3. I'm using files from the datastore that were written at the same time so I'm not sure how that would happen. I split my multifasta before running it through maker and have not merged the gff or protein.fasta for iprscan. That wouldn't be the problem would it? > You have entries in your interproscan output that aren't in your GFF3. Is > your GFF3 file truncated? > > --Carson > > > On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" > wrote: > >>Hi, I have a question about the interproscan scripts available with >> maker. >> >>I'm following the recommendations posted by Carson in Aug 2011 to >>incorporate results from iprscan. I'm getting quite a few warning >> messages >>with ipr_update_gff; they're all the same and suggest that there's no >>value for $name. When I look through the updated gff, however, the >> dbxrefs >>have been added. Is this something I should be worried about? >> >>I'm using iprscan version 5 and actually get some warning messages there >>as well but again, the output looks alright. In addition, some of my >>fastas don't get these warnings in iprscan and they still give me the >>error with ipr_update_gff so I don't think that's the problem. I'm using >>proteins from UniProt. My commands and errors are below. I've also >>attached the first 20000 lines from my initial gff and raw file from >>iprscan. >> >>Thanks, I really appreciate your continued support. >>Kira >> >>### >> >>commands for interproscan scripts available in maker >>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >> -inplace >> >>error after last step (just an example, a ton of similar lines): >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15242. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15353. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15674. >>Use of uninitialized value $name in hash element at >>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line 15776. >> >> >>### >> >>commands for interproscan 5 >>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>interpro_6.out 2>&1 >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >> >>error after first step: >>04/05/2014 19:22:09:269 25% completed >>04/05/2014 21:27:36:305 50% completed >>04/05/2014 21:32:34:236 75% completed >>04/05/2014 21:38:01:379 90% completed >>2014-05-04 21:50:22,761 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>248] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959/jobPIRSF-2.84 >>2014-05-04 21:50:22,908 >>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep: >>253] >>WARN - At run completion, unable to delete temporary directory >>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_17483 >>7921_l959 >>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >> >>error after second step: >>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>2014-05-05 21:04:00,603 >>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>completion, unable to delete temporary directory >>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505_2 >>10353293_gsjh_______________________________________________ >>maker-devel mailing list >>maker-devel at box290.bluehost.com >>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > From carsonhh at gmail.com Tue May 6 09:09:13 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:09:13 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: The file you sent was missing the ##FASTA entry and all sequence at the bottom for example. Is that the way it is in the datastore? --Carson On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" wrote: >Thanks for your reply. I have not truncated the gff3. I'm using files from >the datastore that were written at the same time so I'm not sure how that >would happen. I split my multifasta before running it through maker and >have not merged the gff or protein.fasta for iprscan. That wouldn't be the >problem would it? > >> You have entries in your interproscan output that aren't in your GFF3. >>Is >> your GFF3 file truncated? >> >> --Carson >> >> >> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Hi, I have a question about the interproscan scripts available with >>> maker. >>> >>>I'm following the recommendations posted by Carson in Aug 2011 to >>>incorporate results from iprscan. I'm getting quite a few warning >>> messages >>>with ipr_update_gff; they're all the same and suggest that there's no >>>value for $name. When I look through the updated gff, however, the >>> dbxrefs >>>have been added. Is this something I should be worried about? >>> >>>I'm using iprscan version 5 and actually get some warning messages there >>>as well but again, the output looks alright. In addition, some of my >>>fastas don't get these warnings in iprscan and they still give me the >>>error with ipr_update_gff so I don't think that's the problem. I'm using >>>proteins from UniProt. My commands and errors are below. I've also >>>attached the first 20000 lines from my initial gff and raw file from >>>iprscan. >>> >>>Thanks, I really appreciate your continued support. >>>Kira >>> >>>### >>> >>>commands for interproscan scripts available in maker >>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>> -inplace >>> >>>error after last step (just an example, a ton of similar lines): >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15242. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15353. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15674. >>>Use of uninitialized value $name in hash element at >>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>15776. >>> >>> >>>### >>> >>>commands for interproscan 5 >>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ > >>>interpro_6.out 2>&1 >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>> >>>error after first step: >>>04/05/2014 19:22:09:269 25% completed >>>04/05/2014 21:27:36:305 50% completed >>>04/05/2014 21:32:34:236 75% completed >>>04/05/2014 21:38:01:379 90% completed >>>2014-05-04 21:50:22,761 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>248] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959/jobPIRSF-2.84 >>>2014-05-04 21:50:22,908 >>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>p: >>>253] >>>WARN - At run completion, unable to delete temporary directory >>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>83 >>>7921_l959 >>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>> >>>error after second step: >>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>2014-05-05 21:04:00,603 >>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>completion, unable to delete temporary directory >>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>_2 >>>10353293_gsjh_______________________________________________ >>>maker-devel mailing list >>>maker-devel at box290.bluehost.com >>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > > From kdelmore at zoology.ubc.ca Tue May 6 09:26:07 2014 From: kdelmore at zoology.ubc.ca (kdelmore at zoology.ubc.ca) Date: Tue, 6 May 2014 08:26:07 -0700 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> Message-ID: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> I just printed the first 20000 lines of the gff to send to you because it was too large to send through email. I've included a dropbox link to the full file below. I've also included a link to the final gff with dbx refs; as I mentioned, it does seem to add them even with the error. If I run ipr_update_gff twice, I get the warnings on the first run but not on the second. Does that help diagnose the problem? The only other red flag I've encountered with maker was in including external gff3 from geneid and sgp2. These gff3s failed validation at the website suggested the the README file, with the warning message "cds: non-unique id" for all cds, but maker didn't give me a warning and they seem to be incorporated into the annotation fine. original gff https://www.dropbox.com/s/nimoh605jdk9myx/6.gff final gff https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta Thanks again for getting back to me. > The file you sent was missing the ##FASTA entry and all sequence at the > bottom for example. Is that the way it is in the datastore? > > --Carson > > > On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" > wrote: > >>Thanks for your reply. I have not truncated the gff3. I'm using files >> from >>the datastore that were written at the same time so I'm not sure how that >>would happen. I split my multifasta before running it through maker and >>have not merged the gff or protein.fasta for iprscan. That wouldn't be >> the >>problem would it? >> >>> You have entries in your interproscan output that aren't in your GFF3. >>>Is >>> your GFF3 file truncated? >>> >>> --Carson >>> >>> >>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>> >>> wrote: >>> >>>>Hi, I have a question about the interproscan scripts available with >>>> maker. >>>> >>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>incorporate results from iprscan. I'm getting quite a few warning >>>> messages >>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>value for $name. When I look through the updated gff, however, the >>>> dbxrefs >>>>have been added. Is this something I should be worried about? >>>> >>>>I'm using iprscan version 5 and actually get some warning messages >>>> there >>>>as well but again, the output looks alright. In addition, some of my >>>>fastas don't get these warnings in iprscan and they still give me the >>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>> using >>>>proteins from UniProt. My commands and errors are below. I've also >>>>attached the first 20000 lines from my initial gff and raw file from >>>>iprscan. >>>> >>>>Thanks, I really appreciate your continued support. >>>>Kira >>>> >>>>### >>>> >>>>commands for interproscan scripts available in maker >>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>> -inplace >>>> >>>>error after last step (just an example, a ton of similar lines): >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15242. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15353. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15674. >>>>Use of uninitialized value $name in hash element at >>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>15776. >>>> >>>> >>>>### >>>> >>>>commands for interproscan 5 >>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>> > >>>>interpro_6.out 2>&1 >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>> >>>>error after first step: >>>>04/05/2014 19:22:09:269 25% completed >>>>04/05/2014 21:27:36:305 50% completed >>>>04/05/2014 21:32:34:236 75% completed >>>>04/05/2014 21:38:01:379 90% completed >>>>2014-05-04 21:50:22,761 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>248] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959/jobPIRSF-2.84 >>>>2014-05-04 21:50:22,908 >>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputSte >>>>p: >>>>253] >>>>WARN - At run completion, unable to delete temporary directory >>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_174 >>>>83 >>>>7921_l959 >>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>> >>>>error after second step: >>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>2014-05-05 21:04:00,603 >>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>completion, unable to delete temporary directory >>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140505 >>>>_2 >>>>10353293_gsjh_______________________________________________ >>>>maker-devel mailing list >>>>maker-devel at box290.bluehost.com >>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >> >> > > > From carsonhh at gmail.com Tue May 6 09:47:23 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:47:23 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Ok. With the full file I can see what what was causing the message. It is a parsing bug that was happening in a few cases, and I've now fixed it. But you can ignore it, because it has no effect on the output. It would only be an issue if the ID= and Name= tags were different in the GFF3 for the gene feature lines (which is never be true for MAKER's output). It was correctly parsing the 'mRNA' Name and ID tags, but was sometimes having issue with the Name= tags for the 'gene' lines (but because they are redundant with ID= tag, the script still finds what it needs to add the Dbxref= tags). --Carson On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" wrote: >I just printed the first 20000 lines of the gff to send to you because it >was too large to send through email. I've included a dropbox link to the >full file below. I've also included a link to the final gff with dbx refs; >as I mentioned, it does seem to add them even with the error. If I run >ipr_update_gff twice, I get the warnings on the first run but not on the >second. Does that help diagnose the problem? > >The only other red flag I've encountered with maker was in including >external gff3 from geneid and sgp2. These gff3s failed validation at the >website suggested the the README file, with the warning message "cds: >non-unique id" for all cds, but maker didn't give me a warning and they >seem to be incorporated into the annotation fine. > >original gff >https://www.dropbox.com/s/nimoh605jdk9myx/6.gff > >final gff >https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta > >Thanks again for getting back to me. > >> The file you sent was missing the ##FASTA entry and all sequence at the >> bottom for example. Is that the way it is in the datastore? >> >> --Carson >> >> >> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >> wrote: >> >>>Thanks for your reply. I have not truncated the gff3. I'm using files >>> from >>>the datastore that were written at the same time so I'm not sure how >>>that >>>would happen. I split my multifasta before running it through maker and >>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>> the >>>problem would it? >>> >>>> You have entries in your interproscan output that aren't in your GFF3. >>>>Is >>>> your GFF3 file truncated? >>>> >>>> --Carson >>>> >>>> >>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>> >>>> wrote: >>>> >>>>>Hi, I have a question about the interproscan scripts available with >>>>> maker. >>>>> >>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>> messages >>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>value for $name. When I look through the updated gff, however, the >>>>> dbxrefs >>>>>have been added. Is this something I should be worried about? >>>>> >>>>>I'm using iprscan version 5 and actually get some warning messages >>>>> there >>>>>as well but again, the output looks alright. In addition, some of my >>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>> using >>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>iprscan. >>>>> >>>>>Thanks, I really appreciate your continued support. >>>>>Kira >>>>> >>>>>### >>>>> >>>>>commands for interproscan scripts available in maker >>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>> -inplace >>>>> >>>>>error after last step (just an example, a ton of similar lines): >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15242. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15353. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15674. >>>>>Use of uninitialized value $name in hash element at >>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>15776. >>>>> >>>>> >>>>>### >>>>> >>>>>commands for interproscan 5 >>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup \ >>>>> > >>>>>interpro_6.out 2>&1 >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>> >>>>>error after first step: >>>>>04/05/2014 19:22:09:269 25% completed >>>>>04/05/2014 21:27:36:305 50% completed >>>>>04/05/2014 21:32:34:236 75% completed >>>>>04/05/2014 21:38:01:379 90% completed >>>>>2014-05-04 21:50:22,761 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>248] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959/jobPIRSF-2.84 >>>>>2014-05-04 21:50:22,908 >>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputS >>>>>te >>>>>p: >>>>>253] >>>>>WARN - At run completion, unable to delete temporary directory >>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_1 >>>>>74 >>>>>83 >>>>>7921_l959 >>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>> >>>>>error after second step: >>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>2014-05-05 21:04:00,603 >>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>completion, unable to delete temporary directory >>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_201405 >>>>>05 >>>>>_2 >>>>>10353293_gsjh_______________________________________________ >>>>>maker-devel mailing list >>>>>maker-devel at box290.bluehost.com >>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.or >>>>>g >>>> >>>> >>>> >>> >>> >> >> >> > > From carsonhh at gmail.com Tue May 6 09:54:41 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 06 May 2014 09:54:41 -0600 Subject: [maker-devel] iprscan and ipr_update_gff In-Reply-To: References: <0c2904cc6449ce214317b92576142fa6.squirrel@webmail.zoology.ubc.ca> <068c58fd476b11f5975c25f8d1073de4.squirrel@webmail.zoology.ubc.ca> <51f8ccb838b0e4bed9e06cb373bb7180.squirrel@webmail.zoology.ubc.ca> Message-ID: Actually looking a little closer, it wouldn't even matter if the ID= and Name= tags were different for the 'gene', because interproscan gives the results for the transcripts (mRNA) and not the gene. So Dbxref still gets populated correctly reguardless. --Carson On 5/6/14, 9:47 AM, "Carson Holt" wrote: >Ok. With the full file I can see what what was causing the message. It is >a parsing bug that was happening in a few cases, and I've now fixed it. >But you can ignore it, because it has no effect on the output. > >It would only be an issue if the ID= and Name= tags were different in the >GFF3 for the gene feature lines (which is never be true for MAKER's >output). It was correctly parsing the 'mRNA' Name and ID tags, but was >sometimes having issue with the Name= tags for the 'gene' lines (but >because they are redundant with ID= tag, the script still finds what it >needs to add the Dbxref= tags). > >--Carson > > >On 5/6/14, 9:26 AM, "kdelmore at zoology.ubc.ca" >wrote: > >>I just printed the first 20000 lines of the gff to send to you because it >>was too large to send through email. I've included a dropbox link to the >>full file below. I've also included a link to the final gff with dbx >>refs; >>as I mentioned, it does seem to add them even with the error. If I run >>ipr_update_gff twice, I get the warnings on the first run but not on the >>second. Does that help diagnose the problem? >> >>The only other red flag I've encountered with maker was in including >>external gff3 from geneid and sgp2. These gff3s failed validation at the >>website suggested the the README file, with the warning message "cds: >>non-unique id" for all cds, but maker didn't give me a warning and they >>seem to be incorporated into the annotation fine. >> >>original gff >>https://www.dropbox.com/s/nimoh605jdk9myx/6.gff >> >>final gff >>https://www.dropbox.com/s/3m2vwscjnz1y3o9/6.final_gff.fasta >> >>Thanks again for getting back to me. >> >>> The file you sent was missing the ##FASTA entry and all sequence at the >>> bottom for example. Is that the way it is in the datastore? >>> >>> --Carson >>> >>> >>> On 5/6/14, 9:06 AM, "kdelmore at zoology.ubc.ca" >>> wrote: >>> >>>>Thanks for your reply. I have not truncated the gff3. I'm using files >>>> from >>>>the datastore that were written at the same time so I'm not sure how >>>>that >>>>would happen. I split my multifasta before running it through maker and >>>>have not merged the gff or protein.fasta for iprscan. That wouldn't be >>>> the >>>>problem would it? >>>> >>>>> You have entries in your interproscan output that aren't in your >>>>>GFF3. >>>>>Is >>>>> your GFF3 file truncated? >>>>> >>>>> --Carson >>>>> >>>>> >>>>> On 5/5/14, 10:36 PM, "kdelmore at zoology.ubc.ca" >>>>> >>>>> wrote: >>>>> >>>>>>Hi, I have a question about the interproscan scripts available with >>>>>> maker. >>>>>> >>>>>>I'm following the recommendations posted by Carson in Aug 2011 to >>>>>>incorporate results from iprscan. I'm getting quite a few warning >>>>>> messages >>>>>>with ipr_update_gff; they're all the same and suggest that there's no >>>>>>value for $name. When I look through the updated gff, however, the >>>>>> dbxrefs >>>>>>have been added. Is this something I should be worried about? >>>>>> >>>>>>I'm using iprscan version 5 and actually get some warning messages >>>>>> there >>>>>>as well but again, the output looks alright. In addition, some of my >>>>>>fastas don't get these warnings in iprscan and they still give me the >>>>>>error with ipr_update_gff so I don't think that's the problem. I'm >>>>>> using >>>>>>proteins from UniProt. My commands and errors are below. I've also >>>>>>attached the first 20000 lines from my initial gff and raw file from >>>>>>iprscan. >>>>>> >>>>>>Thanks, I really appreciate your continued support. >>>>>>Kira >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan scripts available in maker >>>>>>iprscan2gff3 6.maker.proteins.fasta.xml.raw 6.gff > 6.domains.gff >>>>>>gff3_merge 6.gff 6.domains.gff -o 6_w_domains.all.gff >>>>>>ipr_update_gff 6_w_domains.all.gff 6.maker.proteins.fasta.xml.raw >>>>>> -inplace >>>>>> >>>>>>error after last step (just an example, a ton of similar lines): >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15242. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15353. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15674. >>>>>>Use of uninitialized value $name in hash element at >>>>>>/home/kdelmore/tools/maker/bin/ipr_update_gff line 107, <$IN> line >>>>>>15776. >>>>>> >>>>>> >>>>>>### >>>>>> >>>>>>commands for interproscan 5 >>>>>>interproscan.sh -i 6.maker.proteins.fasta -f xml -goterms -iprlookup >>>>>>\ >>>>>> > >>>>>>interpro_6.out 2>&1 >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>> >>>>>>error after first step: >>>>>>04/05/2014 19:22:09:269 25% completed >>>>>>04/05/2014 21:27:36:305 50% completed >>>>>>04/05/2014 21:32:34:236 75% completed >>>>>>04/05/2014 21:38:01:379 90% completed >>>>>>2014-05-04 21:50:22,761 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>248] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959/jobPIRSF-2.84 >>>>>>2014-05-04 21:50:22,908 >>>>>>[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutput >>>>>>S >>>>>>te >>>>>>p: >>>>>>253] >>>>>>WARN - At run completion, unable to delete temporary directory >>>>>>/lustre/home/kdelmore/interpro_good/interpro_6/temp/cl2n116_20140504_ >>>>>>1 >>>>>>74 >>>>>>83 >>>>>>7921_l959 >>>>>>04/05/2014 21:50:23:380 100% done: InterProScan analyses completed >>>>>> >>>>>>error after second step: >>>>>>interproscan.sh -mode convert -f raw -i 6.maker.proteins.fasta.xml >>>>>>05/05/2014 21:03:40:457 Welcome to InterProScan-5.3-46.0 >>>>>>05/05/2014 21:03:53:292 Running InterProScan v5 in CONVERT mode... >>>>>>2014-05-05 21:04:00,603 >>>>>>[uk.ac.ebi.interpro.scan.jms.converter.Converter:277] WARN - At run >>>>>>completion, unable to delete temporary directory >>>>>>/home/kdelmore/interpro_good/interpro_6/temp/jasper.westgrid.ca_20140 >>>>>>5 >>>>>>05 >>>>>>_2 >>>>>>10353293_gsjh_______________________________________________ >>>>>>maker-devel mailing list >>>>>>maker-devel at box290.bluehost.com >>>>>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.o >>>>>>r >>>>>>g >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > From sjackman at gmail.com Thu May 8 16:26:34 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:26:34 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun *http://sjackman.ca * On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to > the header of each fasta file to ensure MAKER doesn't try and cluster > unrelated transcripts into a single gene. Then the transcript name and > gene name will be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of > these reference sequences have sensible and consistent gene names, and so > I?m using est_forward to retain the gene names. This workflow is working > well for me. Some of the genes pulled in from GenBank have less useful > names like orf1234 or other numeric IDs. When multiple evidence sequences > map to the same location, how does est_forward choose which name to use? > If it?s chosen arbitrarily, could it be possible to choose the most common > name instead? > > Thanks, > Shaun > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:33:36 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:33:36 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: When moving transcripts onto a new assembly, you may have multiple transcripts of the same gene. Because your transcript name should be your fasta ID there is no way for MAKER to know that they go together when moving the models forward, so you can use the gene= option to make MAKER aware that these belong to the same genes. They will be grouped and you recover all splice forms as a group. Example: >SMEDT_00004 gene=dpp AAAAAAA >SMEDT_00005 gene=dpp AAAAAAA --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:26 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Hi, Carson. Could you give an example of how to add gene_id= to the header of the FASTA file? I?m not clear on what you mean by this. In the FASTA header, what portion is the transcript name, and what portion is the gene name? Cheers, Shaun http://sjackman.ca On 2 May 2014 11:55, Carson Holt wrote: > Whichever has the best AED score I believe, but you can add gene_id= to the > header of each fasta file to ensure MAKER doesn't try and cluster unrelated > transcripts into a single gene. Then the transcript name and gene name will > be guaranteed to match up. > > --Carson > > > From: Shaun Jackman > Date: Wednesday, April 30, 2014 at 5:25 PM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] est_forward and conflicting names > > Hi, Carson. > > I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m > using with est and protein to annotate a plant mitochondrion. Most of these > reference sequences have sensible and consistent gene names, and so I?m using > est_forward to retain the gene names. This workflow is working well for me. > Some of the genes pulled in from GenBank have less useful names like orf1234 > or other numeric IDs. When multiple evidence sequences map to the same > location, how does est_forward choose which name to use? If it?s chosen > arbitrarily, could it be possible to choose the most common name instead? > > Thanks, > Shaun > > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Thu May 8 16:41:41 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Thu, 8 May 2014 15:41:41 -0700 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun *http://sjackman.ca * On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple > transcripts of the same gene. Because your transcript name should be your > fasta ID there is no way for MAKER to know that they go together when > moving the models forward, so you can use the gene= option to make MAKER > aware that these belong to the same genes. They will be grouped and you > recover all splice forms as a group. > > Example: > > >SMEDT_00004 gene=dpp > AAAAAAA > > >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the > header of the FASTA file? I?m not clear on what you mean by this. In the > FASTA header, what portion is the transcript name, and what portion is the > gene name? > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 2 May 2014 11:55, Carson Holt wrote: > >> Whichever has the best AED score I believe, but you can add gene_id= to >> the header of each fasta file to ensure MAKER doesn't try and cluster >> unrelated transcripts into a single gene. Then the transcript name and >> gene name will be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which >> I?m using with est and protein to annotate a plant mitochondrion. Most >> of these reference sequences have sensible and consistent gene names, and >> so I?m using est_forward to retain the gene names. This workflow is >> working well for me. Some of the genes pulled in from GenBank have less >> useful names like orf1234 or other numeric IDs. When multiple evidence >> sequences map to the same location, how does est_forward choose which >> name to use? If it?s chosen arbitrarily, could it be possible to choose the >> most common name instead? >> >> Thanks, >> Shaun >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 8 16:43:40 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 08 May 2014 16:43:40 -0600 Subject: [maker-devel] est_forward and conflicting names In-Reply-To: References: Message-ID: Only if you were to remove the brackets around gene=. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Thursday, May 8, 2014 at 4:41 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] est_forward and conflicting names Interesting. Thanks for the clarification. I?m working on a plant mitochondrion, and so as far as I know, there?s no alternative splicing. My protein FASTA file is composed of the protein sequences of ~100 species downloaded from GenBank. It looks like this: >cox1|lcl|KJ461445.1_cdsid_AHY20320.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=AHY20320.1] [location=complement(59212..60795)] ? >cox1|lcl|EU534409.1_cdsid_ACA62629.1 [gene=cox1] [protein=cox1] [protein_id=ACA62629.1] [location=245282..246856] ? >cox1|lcl|NC_023103.1_cdsid_YP_008964124.1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=YP_008964124.1] [location=join(317824..318438,319511..320368)] ? I?m not sure that I actually want the fancy behaviour that you describe, though it probably wouldn?t hurt anything. Will this FASTA format trigger the fancy behaviour? Cheers, Shaun http://sjackman.ca On 8 May 2014 15:33, Carson Holt wrote: > When moving transcripts onto a new assembly, you may have multiple transcripts > of the same gene. Because your transcript name should be your fasta ID there > is no way for MAKER to know that they go together when moving the models > forward, so you can use the gene= option to make MAKER aware that these belong > to the same genes. They will be grouped and you recover all splice forms as a > group. > > Example: > >> >SMEDT_00004 gene=dpp > AAAAAAA > >> >SMEDT_00005 gene=dpp > AAAAAAA > > --Carson > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Thursday, May 8, 2014 at 4:26 PM > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] est_forward and conflicting names > > Hi, Carson. Could you give an example of how to add gene_id= to the header of > the FASTA file? I?m not clear on what you mean by this. In the FASTA header, > what portion is the transcript name, and what portion is the gene name? > > Cheers, > Shaun > > > http://sjackman.ca > > > On 2 May 2014 11:55, Carson Holt wrote: >> Whichever has the best AED score I believe, but you can add gene_id= to the >> header of each fasta file to ensure MAKER doesn't try and cluster unrelated >> transcripts into a single gene. Then the transcript name and gene name will >> be guaranteed to match up. >> >> --Carson >> >> >> From: Shaun Jackman >> Date: Wednesday, April 30, 2014 at 5:25 PM >> To: "maker-devel at yandell-lab.org" >> Subject: [maker-devel] est_forward and conflicting names >> >> Hi, Carson. >> >> I?ve downloaded a number genes from GenBank using Entrez Direct, which I?m >> using with est and protein to annotate a plant mitochondrion. Most of these >> reference sequences have sensible and consistent gene names, and so I?m using >> est_forward to retain the gene names. This workflow is working well for me. >> Some of the genes pulled in from GenBank have less useful names like orf1234 >> or other numeric IDs. When multiple evidence sequences map to the same >> location, how does est_forward choose which name to use? If it?s chosen >> arbitrarily, could it be possible to choose the most common name instead? >> >> Thanks, >> Shaun >> >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/ma >> ker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:07:52 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:07:52 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. Cheers, Shaun On 4 March 2014 18:33, Carson Holt wrote: > Trying to call non-coding RNA from ESTs or even sequence homology is > extremely messy (non-trivial problem in most organisms with high false > positive rate), so MAKER for the most part doesn?t even try to do that. It > focuses only on the coding genes. You can now use tRNAscan and snoscan in > the newest version for some non-coding RNA support (those features were > only added a couple of months ago). So just like other prediction tools > (snap, augustus etc.), the primary focus has always been the coding genes. > We?ve only started adding non-coding RNA support recently for iPlant, so > it?s still relatively immature. > > Thanks, > Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Tuesday, March 4, 2014 at 7:10 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I set single_length=50, and it worked like a charm. Thanks > for the tip. > > The rRNA genes that are found with est2genome have the feature type set to > *mRNA* and have corresponding *five_prime_UTR*, *CDS* and > *three_prime_UTR* features. Ideally the feature type would be set to > *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS features. > Is that a feature that you would be interested in adding to MAKER? The rRNA > gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is > standard, so determining the appropriate type should be straight forward. > > Thanks again for your help with this. Cheers, > Shaun > > > On 27 February 2014 17:13, Carson Holt wrote: > >> Set single_exon=1, and the minimum size to a smaller value. I think it's >> set to 250 right now. Also est2genome is looking for ORF, so if there is >> none (as with tRNAs) they probably won't get picked up. >> >> --Carson >> >> Sent from my iPhone >> >> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >> >> Sorry, ignore my previous question. est_forward also carries forward the >> names of protein evidence and works like a charm. Thank you! >> >> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >> these hits? >> >> organism_type=prokaryotic >> est2genome=1 >> protein2genome=1 >> est_forward=1 >> >> Cheers, >> Shaun >> >> >> On 27 February 2014 15:17, Shaun Jackman wrote: >> >>> Is there a corresponding protein_forward=1 option to map forward protein >>> names from protein2genome? >>> >>> Cheers, >>> Shaun >>> >>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>> wrote: >>> >>> Sorry I meant to say prefilter on the score in the mRNA column before >>> passing the gff3 to model_gff. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>> >>> What you can do is run it once with just est_forward=1 and >>> est2genome/protein2genome set to 1. Then take those results, pass them in >>> as model_gff and use the map_forward option to then filter the results >>> based on mRNA score and that would copy names onto new gene under the >>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>> separate tool that will map genes onto new assemblies (but under the hood >>> the tool will just be calling MAKER with certain parameters restricted). I >>> do this because if people commonly use it mixed with things like SNAP I can >>> start to get some very weird behaviors. >>> >>> Thanks, >>> Carson >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 3:04 PM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> It seems that this could be a very useful option in those cases where >>> you have firm a priori knowledge of the placement of ESTs. However, while >>> trying it I note that est_forward implies that the est2genome predictor is >>> turned on, implicitly. Is this necessary for this to work? I?m after the >>> behavior you describe below where exonerate is made to try really hard >>> within a limited region to align an est, but I would not like maker to >>> produce est2genome predictions. >>> >>> In general, I think this maker_coor and est_forward is a feature set >>> that is worthy to be promoted into a documented feature. >>> >>> THanks, >>> Mikael >>> >>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>> >>> It will still work without est_forward. It just works a little >>> differently. Keep in mind this was a hidden feature I used to find >>> stubborn or hard to find missing genes after reassembly of a genome. >>> >>> If est_forward is provided, MAKER will parse the database to look for >>> the maker_coor tags early in the pipeline. Then it will create a list of >>> locations to search, and it will search them even if there are no BLAST >>> results to seed the search (normally MAKER gets a BLAST result first and >>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>> look for a match using all of chr1 as the input to exonerate even when >>> BLAST finds nothing (this is a very very slow search, but can help pick up >>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>> gives exonerate looser matching parameters (i.e. allows for single base >>> pair introns perhaps caused by assembly errors). The logic here is that >>> given the fact that I already told MAKER that with some degree of >>> confidence I expect sequence A to map to to location X, it will try its >>> hardest to make it match. >>> >>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>> at line 1563, but only after a BLAST alignment has already seeded it to the >>> region (that BLAST result has the information in its description >>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>> In addition any BLAST seeds that overlap maker_coor will get the search >>> space for alignment polishing adjusted to match maker_coor exactly. Also >>> match parameters for exonerate will not be relaxed as they were with >>> est_forward. >>> >>> As you can see the behavior, is slightly different (because it?s an >>> accidental feature). >>> >>> Thanks, >>> Carson >>> >>> >>> >>> From: Mikael Brandstr?m Durling >>> Date: Wednesday, February 26, 2014 at 6:37 AM >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> That might be a useful and time saving accidental feature. But, reading >>> the code, it seems that I need to supply maker_coor but not gene_id, as >>> well as the configuration option est_forward for this to work. Any >>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>> right? >>> >>> Mikael >>> >>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>> >>> Yes. That should work as well as an accidental feature. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>> mikael.durling at slu.se> wrote: >>> >>> Can this use of maker_coor be used only to hint about the placement of >>> the ests, without affecting the naming of the final genes? Ie if I have a >>> database of EST where I have a priori knowledge of their rough placement, >>> can this placement be given to maker without providing est_forward=1? >>> >>> Thanks, >>> Mikael >>> >>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>> >>> There is a way. It?s not a standard option and it?s undocumented, but >>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>> that. The option won?t already be there so you?ll have to type it in. >>> >>> There is also a feature designed to work with this option. If you add >>> tags to your fasta headers, those can be used to guide the mapping and >>> naming. For example, gene_id= will ensure different isoforms >>> that share a common gene_id get clustered into the same gene, >>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>> >>> This is an undocumented way to remap genes onto new assemblies using >>> blast alignments of earlier transcript or protein annotations as a guide. >>> >>> ?Carson >>> >>> >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, February 25, 2014 at 5:06 PM >>> To: >>> Subject: [maker-devel] Mapping gene names >>> >>> Hi, >>> >>> I?m annotating a genome using a closely related genome from Genbank, >>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>> well. Is it possible to map the names of the genes from the related species >>> to my annotation? I see the *map_forward* option, which applies to the >>> *model_gff* parameter. Is there a similar option for *est* and *protein* >>> ? >>> >>> *maker_opts.ctl* >>> >>> est=NC_123456.frn >>> protein=NC_123456.faa >>> est2genome=1 >>> protein2genome=1 >>> >>> Thanks, >>> Shaun >>> _______________________________________________ maker-devel mailing list >>> maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >>> >>> >>> _______________________________________________ >>> maker-devel mailing list >>> maker-devel at box290.bluehost.com >>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 15:18:52 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 15:18:52 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Thanks. Looks interesting. Also since output is already GFF3, you could probably just use it with gff passthrough. It doesn't appear to support eukaryotes though. --Carson Sent from my iPhone > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnap to predict rRNA. > > Cheers, > Shaun > > >> On 4 March 2014 18:33, Carson Holt wrote: >> Trying to call non-coding RNA from ESTs or even sequence homology is extremely messy (non-trivial problem in most organisms with high false positive rate), so MAKER for the most part doesn?t even try to do that. It focuses only on the coding genes. You can now use tRNAscan and snoscan in the newest version for some non-coding RNA support (those features were only added a couple of months ago). So just like other prediction tools (snap, augustus etc.), the primary focus has always been the coding genes. We?ve only started adding non-coding RNA support recently for iPlant, so it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set to mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR features. Ideally the feature type would be set to rRNA or tRNA as appropriate, and would omit the UTR and CDS features. Is that a feature that you would be interested in adding to MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names with ?trn?, as is standard, so determining the appropriate type should be straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> Set single_exon=1, and the minimum size to a smaller value. I think it's set to 250 right now. Also est2genome is looking for ORF, so if there is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They are in the blastn output, and in the evidence_0.gff. rrn5 has perfect identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> Cheers, >>>> Shaun >>>> >>>> >>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>> Is there a corresponding protein_forward=1 option to map forward protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) wrote: >>>>>> >>>>>> Sorry I meant to say prefilter on the score in the mRNA column before passing the gff3 to model_gff. >>>>>> >>>>>> --Carson >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and est2genome/protein2genome set to 1. Then take those results, pass them in as model_gff and use the map_forward option to then filter the results based on mRNA score and that would copy names onto new gene under the standard MAKER pipeline. Eventually it?s really supposed to go into a separate tool that will map genes onto new assemblies (but under the hood the tool will just be calling MAKER with certain parameters restricted). I do this because if people commonly use it mixed with things like SNAP I can start to get some very weird behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where you have firm a priori knowledge of the placement of ESTs. However, while trying it I note that est_forward implies that the est2genome predictor is turned on, implicitly. Is this necessary for this to work? I?m after the behavior you describe below where exonerate is made to try really hard within a limited region to align an est, but I would not like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>>> >>>>>>>> It will still work without est_forward. It just works a little differently. Keep in mind this was a hidden feature I used to find stubborn or hard to find missing genes after reassembly of a genome. >>>>>>>> >>>>>>>> If est_forward is provided, MAKER will parse the database to look for the maker_coor tags early in the pipeline. Then it will create a list of locations to search, and it will search them even if there are no BLAST results to seed the search (normally MAKER gets a BLAST result first and then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to look for a match using all of chr1 as the input to exonerate even when BLAST finds nothing (this is a very very slow search, but can help pick up one or two stubborn genes that don?t remap well). To allow this, MAKER gives exonerate looser matching parameters (i.e. allows for single base pair introns perhaps caused by assembly errors). The logic here is that given the fact that I already told MAKER that with some degree of confidence I expect sequence A to map to to location X, it will try its hardest to make it match. >>>>>>>> >>>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm at line 1563, but only after a BLAST alignment has already seeded it to the region (that BLAST result has the information in its description parameter). MAKER will then ignore seeds completely outside of maker_coor. In addition any BLAST seeds that overlap maker_coor will get the search space for alignment polishing adjusted to match maker_coor exactly. Also match parameters for exonerate will not be relaxed as they were with est_forward. >>>>>>>> >>>>>>>> As you can see the behavior, is slightly different (because it?s an accidental feature). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Carson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Mikael Brandstr?m Durling >>>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>>> To: Carson Holt >>>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>>> >>>>>>>> That might be a useful and time saving accidental feature. But, reading the code, it seems that I need to supply maker_coor but not gene_id, as well as the configuration option est_forward for this to work. Any occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 right? >>>>>>>> >>>>>>>> Mikael >>>>>>>> >>>>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>>>> >>>>>>>>> Yes. That should work as well as an accidental feature. >>>>>>>>> >>>>>>>>> --Carson >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling wrote: >>>>>>>>>> >>>>>>>>>> Can this use of maker_coor be used only to hint about the placement of the ests, without affecting the naming of the final genes? Ie if I have a database of EST where I have a priori knowledge of their rough placement, can this placement be given to maker without providing est_forward=1? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Mikael >>>>>>>>>> >>>>>>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>>>>>> >>>>>>>>>>> There is a way. It?s not a standard option and it?s undocumented, but if you add est_forward=1 to the maker_opts.ctl file, then it will do just that. The option won?t already be there so you?ll have to type it in. >>>>>>>>>>> >>>>>>>>>>> There is also a feature designed to work with this option. If you add tags to your fasta headers, those can be used to guide the mapping and naming. For example, gene_id= will ensure different isoforms that share a common gene_id get clustered into the same gene, and maker_coor=chr1:1-10000 in the fasta header will force a particular sequence to only be mapped against chr1 within the range of 1-10000 bp and just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>>>>>>>> >>>>>>>>>>> This is an undocumented way to remap genes onto new assemblies using blast alignments of earlier transcript or protein annotations as a guide. >>>>>>>>>>> >>>>>>>>>>> ?Carson >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Shaun Jackman >>>>>>>>>>> Reply-To: Shaun Jackman >>>>>>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>>>>>> To: >>>>>>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I?m annotating a genome using a closely related genome from Genbank, using the .frn (RNA) and .faa (protein) files from Genbank as evidence to annotate my genome. I?ve run Maker, and the annotation seems to have worked well. Is it possible to map the names of the genes from the related species to my annotation? I see the map_forward option, which applies to the model_gff parameter. Is there a similar option for est and protein? >>>>>>>>>>> >>>>>>>>>>> maker_opts.ctl >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> est=NC_123456.frn >>>>>>>>>>> protein=NC_123456.faa >>>>>>>>>>> est2genome=1 >>>>>>>>>>> protein2genome=1 >>>>>>>>>>> Thanks, >>>>>>>>>>> Shaun >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> maker-devel mailing list >>>>>>>>>>> maker-devel at box290.bluehost.com >>>>>>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>> _______________________________________________ >>>>>> maker-devel mailing list >>>>>> maker-devel at box290.bluehost.com >>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 15:25:21 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 14:25:21 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson, Torsten. It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). barrnap --help ? --kingdom [X] Kingdom: [b]acteria [a]rchaea (default 'bacteria') Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:06:31 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:06:31 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun *http://sjackman.ca * On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > > Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. > > Cheers, > Shaun > > On 4 March 2014 18:33, Carson Holt wrote: > >> Trying to call non-coding RNA from ESTs or even sequence homology is >> extremely messy (non-trivial problem in most organisms with high false >> positive rate), so MAKER for the most part doesn?t even try to do that. It >> focuses only on the coding genes. You can now use tRNAscan and snoscan in >> the newest version for some non-coding RNA support (those features were >> only added a couple of months ago). So just like other prediction tools >> (snap, augustus etc.), the primary focus has always been the coding genes. >> We?ve only started adding non-coding RNA support recently for iPlant, so >> it?s still relatively immature. >> >> Thanks, >> Carson >> >> >> From: Shaun Jackman >> Reply-To: Shaun Jackman >> Date: Tuesday, March 4, 2014 at 7:10 PM >> >> To: Carson Holt >> Cc: "maker-devel at yandell-lab.org" >> Subject: Re: [maker-devel] Mapping gene names >> >> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >> for the tip. >> >> The rRNA genes that are found with est2genome have the feature type set >> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >> *three_prime_UTR* features. Ideally the feature type would be set to >> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >> features. Is that a feature that you would be interested in adding to >> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >> with ?trn?, as is standard, so determining the appropriate type should be >> straight forward. >> >> Thanks again for your help with this. Cheers, >> Shaun >> >> >> On 27 February 2014 17:13, Carson Holt wrote: >> >>> Set single_exon=1, and the minimum size to a smaller value. I think >>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>> is none (as with tRNAs) they probably won't get picked up. >>> >>> --Carson >>> >>> Sent from my iPhone >>> >>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>> >>> Sorry, ignore my previous question. est_forward also carries forward the >>> names of protein evidence and works like a charm. Thank you! >>> >>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>> these hits? >>> >>> organism_type=prokaryotic >>> est2genome=1 >>> protein2genome=1 >>> est_forward=1 >>> >>> Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 15:17, Shaun Jackman wrote: >>> >>>> Is there a corresponding protein_forward=1 option to map forward >>>> protein names from protein2genome? >>>> >>>> Cheers, >>>> Shaun >>>> >>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>> wrote: >>>> >>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>> passing the gff3 to model_gff. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>> >>>> What you can do is run it once with just est_forward=1 and >>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>> as model_gff and use the map_forward option to then filter the results >>>> based on mRNA score and that would copy names onto new gene under the >>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>> separate tool that will map genes onto new assemblies (but under the hood >>>> the tool will just be calling MAKER with certain parameters restricted). I >>>> do this because if people commonly use it mixed with things like SNAP I can >>>> start to get some very weird behaviors. >>>> >>>> Thanks, >>>> Carson >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> It seems that this could be a very useful option in those cases where >>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>> trying it I note that est_forward implies that the est2genome predictor is >>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>> behavior you describe below where exonerate is made to try really hard >>>> within a limited region to align an est, but I would not like maker to >>>> produce est2genome predictions. >>>> >>>> In general, I think this maker_coor and est_forward is a feature set >>>> that is worthy to be promoted into a documented feature. >>>> >>>> THanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>> >>>> It will still work without est_forward. It just works a little >>>> differently. Keep in mind this was a hidden feature I used to find >>>> stubborn or hard to find missing genes after reassembly of a genome. >>>> >>>> If est_forward is provided, MAKER will parse the database to look for >>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>> locations to search, and it will search them even if there are no BLAST >>>> results to seed the search (normally MAKER gets a BLAST result first and >>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>> look for a match using all of chr1 as the input to exonerate even when >>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>> gives exonerate looser matching parameters (i.e. allows for single base >>>> pair introns perhaps caused by assembly errors). The logic here is that >>>> given the fact that I already told MAKER that with some degree of >>>> confidence I expect sequence A to map to to location X, it will try its >>>> hardest to make it match. >>>> >>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>> region (that BLAST result has the information in its description >>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>> match parameters for exonerate will not be relaxed as they were with >>>> est_forward. >>>> >>>> As you can see the behavior, is slightly different (because it?s an >>>> accidental feature). >>>> >>>> Thanks, >>>> Carson >>>> >>>> >>>> >>>> From: Mikael Brandstr?m Durling >>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>> To: Carson Holt >>>> Cc: "maker-devel at yandell-lab.org" >>>> Subject: Re: [maker-devel] Mapping gene names >>>> >>>> That might be a useful and time saving accidental feature. But, reading >>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>> well as the configuration option est_forward for this to work. Any >>>> occurrences of maker_coor in GI.pm seems to be conditioned on set_forward=1 >>>> right? >>>> >>>> Mikael >>>> >>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>> >>>> Yes. That should work as well as an accidental feature. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>> mikael.durling at slu.se> wrote: >>>> >>>> Can this use of maker_coor be used only to hint about the placement of >>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>> database of EST where I have a priori knowledge of their rough placement, >>>> can this placement be given to maker without providing est_forward=1? >>>> >>>> Thanks, >>>> Mikael >>>> >>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>> >>>> There is a way. It?s not a standard option and it?s undocumented, but >>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>> that. The option won?t already be there so you?ll have to type it in. >>>> >>>> There is also a feature designed to work with this option. If you add >>>> tags to your fasta headers, those can be used to guide the mapping and >>>> naming. For example, gene_id= will ensure different isoforms >>>> that share a common gene_id get clustered into the same gene, >>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>> >>>> This is an undocumented way to remap genes onto new assemblies using >>>> blast alignments of earlier transcript or protein annotations as a guide. >>>> >>>> ?Carson >>>> >>>> >>>> >>>> >>>> From: Shaun Jackman >>>> Reply-To: Shaun Jackman >>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>> To: >>>> Subject: [maker-devel] Mapping gene names >>>> >>>> Hi, >>>> >>>> I?m annotating a genome using a closely related genome from Genbank, >>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>> well. Is it possible to map the names of the genes from the related species >>>> to my annotation? I see the *map_forward* option, which applies to the >>>> *model_gff* parameter. Is there a similar option for *est* and >>>> *protein*? >>>> >>>> *maker_opts.ctl* >>>> >>>> est=NC_123456.frn >>>> protein=NC_123456.faa >>>> est2genome=1 >>>> protein2genome=1 >>>> >>>> Thanks, >>>> Shaun >>>> _______________________________________________ maker-devel mailing >>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> maker-devel mailing list >>>> maker-devel at box290.bluehost.com >>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 14 18:19:43 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 14 May 2014 18:19:43 -0600 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: That should be fixed in the current download? It came up on the mailing list a couple of weeks ago. I'll check. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Wednesday, May 14, 2014 at 6:06 PM To: Carson Holt Cc: "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Mapping gene names Hi, Carson. I used other_gff to pass the following four-line GFF file of Barrnap rRNA annotations through. The output of gff3_merge is quite bizarre. See below. Input: ##gff-version 3 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA Output: ### ARRAY(0x7feceb928780) ### ARRAY(0x7feceaa548a0) ### ARRAY(0x7feceeb01c60) ### ARRAY(0x7fecedf6fef8) ### Cheers, Shaun http://sjackman.ca On 14 May 2014 14:18, Carson Holt wrote: > Thanks. Looks interesting. Also since output is already GFF3, you could > probably just use it with gff passthrough. It doesn't appear to support > eukaryotes though. > > --Carson > > > Sent from my iPhone > > On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: > >> Hi, Carson. Perhaps MAKER could integrate Barrnap >> to predict rRNA. >> >> Cheers, >> Shaun >> >> >> On 4 March 2014 18:33, Carson Holt wrote: >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were only >>> added a couple of months ago). So just like other prediction tools (snap, >>> augustus etc.), the primary focus has always been the coding genes. We?ve >>> only started adding non-coding RNA support recently for iPlant, so it?s >>> still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks for >>> the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set to >>> mRNA and have corresponding five_prime_UTR, CDS and three_prime_UTR >>> features. Ideally the feature type would be set to rRNA or tRNA as >>> appropriate, and would omit the UTR and CDS features. Is that a feature that >>> you would be interested in adding to MAKER? The rRNA gene names all start >>> with ?rrn? and the tRNA gene names with ?trn?, as is standard, so >>> determining the appropriate type should be straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>>> Set single_exon=1, and the minimum size to a smaller value. I think it's >>>> set to 250 right now. Also est2genome is looking for ORF, so if there is >>>> none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>>> Sorry, ignore my previous question. est_forward also carries forward the >>>>> names of protein evidence and works like a charm. Thank you! >>>>> >>>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>>> these hits? >>>>> organism_type=prokaryotic >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> est_forward=1 >>>>> Cheers, >>>>> Shaun >>>>> >>>>> >>>>> >>>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>>>> Is there a corresponding protein_forward=1 option to map forward protein >>>>>> names from protein2genome? >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Shaun >>>>>> >>>>>> >>>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com >>>>>> ) wrote: >>>>>> >>>>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>>>> passing the gff3 to model_gff. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>>>> >>>>>>> What you can do is run it once with just est_forward=1 and >>>>>>> est2genome/protein2genome set to 1. Then take those results, pass them >>>>>>> in as model_gff and use the map_forward option to then filter the >>>>>>> results based on mRNA score and that would copy names onto new gene >>>>>>> under the standard MAKER pipeline. Eventually it?s really supposed to >>>>>>> go into a separate tool that will map genes onto new assemblies (but >>>>>>> under the hood the tool will just be calling MAKER with certain >>>>>>> parameters restricted). I do this because if people commonly use it >>>>>>> mixed with things like SNAP I can start to get some very weird >>>>>>> behaviors. >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> It seems that this could be a very useful option in those cases where >>>>>>> you have firm a priori knowledge of the placement of ESTs. However, >>>>>>> while trying it I note that est_forward implies that the est2genome >>>>>>> predictor is turned on, implicitly. Is this necessary for this to work? >>>>>>> I?m after the behavior you describe below where exonerate is made to try >>>>>>> really hard within a limited region to align an est, but I would not >>>>>>> like maker to produce est2genome predictions. >>>>>>> >>>>>>> In general, I think this maker_coor and est_forward is a feature set >>>>>>> that is worthy to be promoted into a documented feature. >>>>>>> >>>>>>> THanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>>>> >>>>>>> It will still work without est_forward. It just works a little >>>>>>> differently. Keep in mind this was a hidden feature I used to find >>>>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>>>> >>>>>>> If est_forward is provided, MAKER will parse the database to look for >>>>>>> the maker_coor tags early in the pipeline. Then it will create a list >>>>>>> of locations to search, and it will search them even if there are no >>>>>>> BLAST results to seed the search (normally MAKER gets a BLAST result >>>>>>> first and then polishes it with exonerate). So maker_coor=chr1 will >>>>>>> cause MAKER to look for a match using all of chr1 as the input to >>>>>>> exonerate even when BLAST finds nothing (this is a very very slow >>>>>>> search, but can help pick up one or two stubborn genes that don?t remap >>>>>>> well). To allow this, MAKER gives exonerate looser matching parameters >>>>>>> (i.e. allows for single base pair introns perhaps caused by assembly >>>>>>> errors). The logic here is that given the fact that I already told >>>>>>> MAKER that with some degree of confidence I expect sequence A to map to >>>>>>> to location X, it will try its hardest to make it match. >>>>>>> >>>>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>>>> at line 1563, but only after a BLAST alignment has already seeded it to >>>>>>> the region (that BLAST result has the information in its description >>>>>>> parameter). MAKER will then ignore seeds completely outside of >>>>>>> maker_coor. In addition any BLAST seeds that overlap maker_coor will get >>>>>>> the search space for alignment polishing adjusted to match maker_coor >>>>>>> exactly. Also match parameters for exonerate will not be relaxed as >>>>>>> they were with est_forward. >>>>>>> >>>>>>> As you can see the behavior, is slightly different (because it?s an >>>>>>> accidental feature). >>>>>>> >>>>>>> Thanks, >>>>>>> Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Mikael Brandstr?m Durling >>>>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>>>> To: Carson Holt >>>>>>> Cc: "maker-devel at yandell-lab.org" >>>>>>> Subject: Re: [maker-devel] Mapping gene names >>>>>>> >>>>>>> That might be a useful and time saving accidental feature. But, reading >>>>>>> the code, it seems that I need to supply maker_coor but not gene_id, as >>>>>>> well as the configuration option est_forward for this to work. Any >>>>>>> occurrences of maker_coor in GI.pm seems to be conditioned on >>>>>>> set_forward=1 right? >>>>>>> >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>>>> >>>>>>> Yes. That should work as well as an accidental feature. >>>>>>> >>>>>>> --Carson >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling >>>>>>> wrote: >>>>>>> >>>>>>> Can this use of maker_coor be used only to hint about the placement of >>>>>>> the ests, without affecting the naming of the final genes? Ie if I have >>>>>>> a database of EST where I have a priori knowledge of their rough >>>>>>> placement, can this placement be given to maker without providing >>>>>>> est_forward=1? >>>>>>> >>>>>>> Thanks, >>>>>>> Mikael >>>>>>> >>>>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>>>> >>>>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do >>>>>>> just that. The option won?t already be there so you?ll have to type it >>>>>>> in. >>>>>>> >>>>>>> There is also a feature designed to work with this option. If you add >>>>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>>>> naming. For example, gene_id= will ensure different >>>>>>> isoforms that share a common gene_id get clustered into the same gene, >>>>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp >>>>>>> and just using maker_coor=chr1 will force it to only be mapped against >>>>>>> chr1. >>>>>>> >>>>>>> This is an undocumented way to remap genes onto new assemblies using >>>>>>> blast alignments of earlier transcript or protein annotations as a >>>>>>> guide. >>>>>>> >>>>>>> ?Carson >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Shaun Jackman >>>>>>> Reply-To: Shaun Jackman >>>>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>>>> To: >>>>>>> Subject: [maker-devel] Mapping gene names >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence >>>>>>> to annotate my genome. I?ve run Maker, and the annotation seems to have >>>>>>> worked well. Is it possible to map the names of the genes from the >>>>>>> related species to my annotation? I see the map_forward option, which >>>>>>> applies to the model_gff parameter. Is there a similar option for est >>>>>>> and protein? >>>>>>> >>>>>>> maker_opts.ctl >>>>>>> est=NC_123456.frn >>>>>>> protein=NC_123456.faa >>>>>>> est2genome=1 >>>>>>> protein2genome=1 >>>>>>> Thanks, >>>>>>> Shaun >>>>>>> _______________________________________________ maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>> > >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> maker-devel mailing list >>>>>>> maker-devel at box290.bluehost.com >>>>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Wed May 14 18:22:37 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Wed, 14 May 2014 17:22:37 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: I'm using MAKER 2.31.4. *http://sjackman.ca * On 14 May 2014 17:19, Carson Holt wrote: > That should be fixed in the current download? It came up on the mailing > list a couple of weeks ago. I'll check. > > --Carson > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Wednesday, May 14, 2014 at 6:06 PM > > To: Carson Holt > Cc: "maker-devel at yandell-lab.org" > Subject: Re: [maker-devel] Mapping gene names > > Hi, Carson. I used other_gff to pass the following four-line GFF file of > Barrnap rRNA annotations through. The output of gff3_merge is quite > bizarre. See below. > > Input: > > ##gff-version 3 > 200408_86 barrnap:0.4 rRNA 2171785 2173036 . + . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3665772 3666686 . - . Name=16S_rRNA;product=16S ribosomal RNA (partial);note=aligned only 57 percent of the 16S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 3826637 3827887 . - . Name=12S_rRNA;product=12S ribosomal RNA > 200408_86 barrnap:0.4 rRNA 4355857 4357119 . + . Name=12S_rRNA;product=12S ribosomal RNA > > Output: > > ### > ARRAY(0x7feceb928780) > ### > ARRAY(0x7feceaa548a0) > ### > ARRAY(0x7feceeb01c60) > ### > ARRAY(0x7fecedf6fef8) > ### > > Cheers, > Shaun > > *http://sjackman.ca * > > > On 14 May 2014 14:18, Carson Holt wrote: > >> Thanks. Looks interesting. Also since output is already GFF3, you could >> probably just use it with gff passthrough. It doesn't appear to support >> eukaryotes though. >> >> --Carson >> >> >> Sent from my iPhone >> >> On May 14, 2014, at 3:07 PM, Shaun Jackman wrote: >> >> Hi, Carson. Perhaps MAKER could integrate Barrnapto predict rRNA. >> >> Cheers, >> Shaun >> >> On 4 March 2014 18:33, Carson Holt wrote: >> >>> Trying to call non-coding RNA from ESTs or even sequence homology is >>> extremely messy (non-trivial problem in most organisms with high false >>> positive rate), so MAKER for the most part doesn?t even try to do that. It >>> focuses only on the coding genes. You can now use tRNAscan and snoscan in >>> the newest version for some non-coding RNA support (those features were >>> only added a couple of months ago). So just like other prediction tools >>> (snap, augustus etc.), the primary focus has always been the coding genes. >>> We?ve only started adding non-coding RNA support recently for iPlant, so >>> it?s still relatively immature. >>> >>> Thanks, >>> Carson >>> >>> >>> From: Shaun Jackman >>> Reply-To: Shaun Jackman >>> Date: Tuesday, March 4, 2014 at 7:10 PM >>> >>> To: Carson Holt >>> Cc: "maker-devel at yandell-lab.org" >>> Subject: Re: [maker-devel] Mapping gene names >>> >>> Hi, Carson. I set single_length=50, and it worked like a charm. Thanks >>> for the tip. >>> >>> The rRNA genes that are found with est2genome have the feature type set >>> to *mRNA* and have corresponding *five_prime_UTR*, *CDS* and >>> *three_prime_UTR* features. Ideally the feature type would be set to >>> *rRNA* or *tRNA* as appropriate, and would omit the UTR and CDS >>> features. Is that a feature that you would be interested in adding to >>> MAKER? The rRNA gene names all start with ?rrn? and the tRNA gene names >>> with ?trn?, as is standard, so determining the appropriate type should be >>> straight forward. >>> >>> Thanks again for your help with this. Cheers, >>> Shaun >>> >>> >>> On 27 February 2014 17:13, Carson Holt wrote: >>> >>>> Set single_exon=1, and the minimum size to a smaller value. I think >>>> it's set to 250 right now. Also est2genome is looking for ORF, so if there >>>> is none (as with tRNAs) they probably won't get picked up. >>>> >>>> --Carson >>>> >>>> Sent from my iPhone >>>> >>>> On Feb 27, 2014, at 5:27 PM, Shaun Jackman wrote: >>>> >>>> Sorry, ignore my previous question. est_forward also carries forward >>>> the names of protein evidence and works like a charm. Thank you! >>>> >>>> The larger rrn16 and rrn23 genes annotated perfectly, but the smaller >>>> rrn4.5 and rrn5 and tRNA genes didn?t make it into the all.gff file. They >>>> are in the blastn output, and in the evidence_0.gff. rrn5 has perfect >>>> identity, sufficient bits (242 > bit_blastn=40) and sufficient E Value >>>> (2e-66 < eval_blastn=1e-10). How should I debug which filter is removing >>>> these hits? >>>> >>>> organism_type=prokaryotic >>>> est2genome=1 >>>> protein2genome=1 >>>> est_forward=1 >>>> >>>> Cheers, >>>> Shaun >>>> >>>> >>>> On 27 February 2014 15:17, Shaun Jackman wrote: >>>> >>>>> Is there a corresponding protein_forward=1 option to map forward >>>>> protein names from protein2genome? >>>>> >>>>> Cheers, >>>>> Shaun >>>>> >>>>> On 2014-February-26 at 15:45:39 , Carson Holt (carsonhh at gmail.com) >>>>> wrote: >>>>> >>>>> Sorry I meant to say prefilter on the score in the mRNA column before >>>>> passing the gff3 to model_gff. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 3:50 PM, Carson Holt wrote: >>>>> >>>>> What you can do is run it once with just est_forward=1 and >>>>> est2genome/protein2genome set to 1. Then take those results, pass them in >>>>> as model_gff and use the map_forward option to then filter the results >>>>> based on mRNA score and that would copy names onto new gene under the >>>>> standard MAKER pipeline. Eventually it?s really supposed to go into a >>>>> separate tool that will map genes onto new assemblies (but under the hood >>>>> the tool will just be calling MAKER with certain parameters restricted). I >>>>> do this because if people commonly use it mixed with things like SNAP I can >>>>> start to get some very weird behaviors. >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 3:04 PM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> It seems that this could be a very useful option in those cases where >>>>> you have firm a priori knowledge of the placement of ESTs. However, while >>>>> trying it I note that est_forward implies that the est2genome predictor is >>>>> turned on, implicitly. Is this necessary for this to work? I?m after the >>>>> behavior you describe below where exonerate is made to try really hard >>>>> within a limited region to align an est, but I would not like maker to >>>>> produce est2genome predictions. >>>>> >>>>> In general, I think this maker_coor and est_forward is a feature set >>>>> that is worthy to be promoted into a documented feature. >>>>> >>>>> THanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 17:09 skrev Carson Holt : >>>>> >>>>> It will still work without est_forward. It just works a little >>>>> differently. Keep in mind this was a hidden feature I used to find >>>>> stubborn or hard to find missing genes after reassembly of a genome. >>>>> >>>>> If est_forward is provided, MAKER will parse the database to look for >>>>> the maker_coor tags early in the pipeline. Then it will create a list of >>>>> locations to search, and it will search them even if there are no BLAST >>>>> results to seed the search (normally MAKER gets a BLAST result first and >>>>> then polishes it with exonerate). So maker_coor=chr1 will cause MAKER to >>>>> look for a match using all of chr1 as the input to exonerate even when >>>>> BLAST finds nothing (this is a very very slow search, but can help pick up >>>>> one or two stubborn genes that don?t remap well). To allow this, MAKER >>>>> gives exonerate looser matching parameters (i.e. allows for single base >>>>> pair introns perhaps caused by assembly errors). The logic here is that >>>>> given the fact that I already told MAKER that with some degree of >>>>> confidence I expect sequence A to map to to location X, it will try its >>>>> hardest to make it match. >>>>> >>>>> Without est_forward set, the maker_coor= flag still gets read in GI.pm >>>>> at line 1563, but only after a BLAST alignment has already seeded it to the >>>>> region (that BLAST result has the information in its description >>>>> parameter). MAKER will then ignore seeds completely outside of maker_coor. >>>>> In addition any BLAST seeds that overlap maker_coor will get the search >>>>> space for alignment polishing adjusted to match maker_coor exactly. Also >>>>> match parameters for exonerate will not be relaxed as they were with >>>>> est_forward. >>>>> >>>>> As you can see the behavior, is slightly different (because it?s an >>>>> accidental feature). >>>>> >>>>> Thanks, >>>>> Carson >>>>> >>>>> >>>>> >>>>> From: Mikael Brandstr?m Durling >>>>> Date: Wednesday, February 26, 2014 at 6:37 AM >>>>> To: Carson Holt >>>>> Cc: "maker-devel at yandell-lab.org" >>>>> Subject: Re: [maker-devel] Mapping gene names >>>>> >>>>> That might be a useful and time saving accidental feature. But, >>>>> reading the code, it seems that I need to supply maker_coor but not >>>>> gene_id, as well as the configuration option est_forward for this to work. >>>>> Any occurrences of maker_coor in GI.pm seems to be conditioned on >>>>> set_forward=1 right? >>>>> >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 14:22 skrev Carson Holt : >>>>> >>>>> Yes. That should work as well as an accidental feature. >>>>> >>>>> --Carson >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 26, 2014, at 5:30 AM, Mikael Brandstr?m Durling < >>>>> mikael.durling at slu.se> wrote: >>>>> >>>>> Can this use of maker_coor be used only to hint about the placement of >>>>> the ests, without affecting the naming of the final genes? Ie if I have a >>>>> database of EST where I have a priori knowledge of their rough placement, >>>>> can this placement be given to maker without providing est_forward=1? >>>>> >>>>> Thanks, >>>>> Mikael >>>>> >>>>> 26 feb 2014 kl. 01:58 skrev Carson Holt : >>>>> >>>>> There is a way. It?s not a standard option and it?s undocumented, but >>>>> if you add est_forward=1 to the maker_opts.ctl file, then it will do just >>>>> that. The option won?t already be there so you?ll have to type it in. >>>>> >>>>> There is also a feature designed to work with this option. If you add >>>>> tags to your fasta headers, those can be used to guide the mapping and >>>>> naming. For example, gene_id= will ensure different isoforms >>>>> that share a common gene_id get clustered into the same gene, >>>>> and maker_coor=chr1:1-10000 in the fasta header will force a particular >>>>> sequence to only be mapped against chr1 within the range of 1-10000 bp and >>>>> just using maker_coor=chr1 will force it to only be mapped against chr1. >>>>> >>>>> This is an undocumented way to remap genes onto new assemblies using >>>>> blast alignments of earlier transcript or protein annotations as a guide. >>>>> >>>>> ?Carson >>>>> >>>>> >>>>> >>>>> >>>>> From: Shaun Jackman >>>>> Reply-To: Shaun Jackman >>>>> Date: Tuesday, February 25, 2014 at 5:06 PM >>>>> To: >>>>> Subject: [maker-devel] Mapping gene names >>>>> >>>>> Hi, >>>>> >>>>> I?m annotating a genome using a closely related genome from Genbank, >>>>> using the .frn (RNA) and .faa (protein) files from Genbank as evidence to >>>>> annotate my genome. I?ve run Maker, and the annotation seems to have worked >>>>> well. Is it possible to map the names of the genes from the related species >>>>> to my annotation? I see the *map_forward* option, which applies to >>>>> the *model_gff* parameter. Is there a similar option for *est* and >>>>> *protein*? >>>>> >>>>> *maker_opts.ctl* >>>>> >>>>> est=NC_123456.frn >>>>> protein=NC_123456.faa >>>>> est2genome=1 >>>>> protein2genome=1 >>>>> >>>>> Thanks, >>>>> Shaun >>>>> _______________________________________________ maker-devel mailing >>>>> list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> maker-devel mailing list >>>>> maker-devel at box290.bluehost.com >>>>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Wed May 14 17:33:55 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Thu, 15 May 2014 09:33:55 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: Carson & Shaun > It doesn?t appear to support eukaryotes though. > > Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap > --help output seems to be out of date. > > Barrnap predicts the location of ribosomal RNA genes in genomes. It > supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria > (12S,16S) and eukaryotes (5S,5.8S,28S,18S). > > It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Wed May 14 21:23:03 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 15 May 2014 03:23:03 +0000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> Message-ID: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> \o/ (now I can get rid of rnammer forever!) chris On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: Carson & Shaun It doesn?t appear to support eukaryotes though. Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap --help output seems to be out of date. Barrnap predicts the location of ribosomal RNA genes in genomes. It supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria (12S,16S) and eukaryotes (5S,5.8S,28S,18S). It does support eukaryota and mitochondria, I just forgot to push the documentation changes. This has been resolved now in the 0.4.2 release. --kingdom [X] Kingdom: euk arc bac mito (default 'bac') Next release 0.5 will have an 'accurate' mode which will fine tune the predictions using cmalign glocal alignment. Thanks for your interest! -- --Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash University, AUSTRALIA --Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajeet at gmail.com Thu May 15 11:36:00 2014 From: sajeet at gmail.com (Sajeet Haridas) Date: Thu, 15 May 2014 10:36:00 -0700 Subject: [maker-devel] Mapping gene names In-Reply-To: <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: My brief test of barrnap suggests that it does not perform well on rRNA genes with introns such as those found in fungal mitochondria. Setting a lower threshold for --reject and --evalue helps, but is not enough. Looks like I cannot abandon rnammer for now. FYI - if you want to test barrnap with fungal mitochondria, use --kingdom bacteria because they have 23S and 16S unlike the human mitochondria. Sajeet On Wed, May 14, 2014 at 8:23 PM, Fields, Christopher J < cjfields at illinois.edu> wrote: > \o/ > > (now I can get rid of rnammer forever!) > > chris > > On May 14, 2014, at 6:33 PM, Torsten Seemann > wrote: > > Carson & Shaun > >> It doesn?t appear to support eukaryotes though. >> >> Barrnap supports bacteria, archaea, mitochondria and eukaryotes. The barrnap >> --help output seems to be out of date. >> >> Barrnap predicts the location of ribosomal RNA genes in genomes. It >> supports bacteria (5S,23S,16S), archaea (5S,5.8S,23S,16S), mitochondria >> (12S,16S) and eukaryotes (5S,5.8S,28S,18S). >> >> It does support eukaryota and mitochondria, I just forgot to push the > documentation changes. This has been resolved now in the 0.4.2 release. > > --kingdom [X] Kingdom: euk arc bac mito (default 'bac') > > Next release 0.5 will have an 'accurate' mode which will fine tune the > predictions using cmalign glocal alignment. > > Thanks for your interest! > > -- > > *--Dr Torsten Seemann --Victorian Bioinformatics Consortium, Monash > University, AUSTRALIA* > > *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA > --http://www.bioinformatics.net.au/ * > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjani at uga.edu Thu May 15 13:00:47 2014 From: ranjani at uga.edu (Sivaranjani Namasivayam) Date: Thu, 15 May 2014 19:00:47 +0000 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: References: Message-ID: <1400180446764.46375@uga.edu> Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani ________________________________ From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt > Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam > Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" > Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsten.seemann at monash.edu Thu May 15 16:42:53 2014 From: torsten.seemann at monash.edu (Torsten Seemann) Date: Fri, 16 May 2014 08:42:53 +1000 Subject: [maker-devel] Mapping gene names In-Reply-To: References: <7BD03296-3496-455B-9DB9-F5C58583FA7D@gmail.com> <104F9C77-CA30-4CB4-A5C4-F43EF30FD6DE@slu.se> <4FD78A68-DDBC-4325-BCE7-E803187BDA94@illinois.edu> Message-ID: Sajeet, Brief test of barrnap suggests that it does not perform well on rRNA genes > with introns such as those found in fungal mitochondria. Setting a lower > threshold for --reject and --evalue helps, but is not enough. > Looks like I cannot abandon rnammer for now. > FYI - if you want to test barrnap with fungal mitochondria, use --kingdom > bacteria because they have 23S and 16S unlike the human mitochondria. > This is good feedback. Paul Gardner also mentioned the intron issue. A "fungi" kingdom is clearly needed. I am not a mycologist so any assistance is coming up with a detailed rRNA architecture for eukaryotict phyla etc is something I have started but need assistance with. Adjustment of nhmmer alignment parameters could be done to improve the intronic rRNAs too. Here is what I have so far in terms of models: https://github.com/Victorian-Bioinformatics-Consortium/barrnap/blob/master/README.md#data-sources-for-hmm-models - do i need to split euk into protist / plant / animal / fungi? - should the current 'mito' be places inside the current 'euk' ? as mito data is likely to end up in assemblies, but keep separate for mito-only data? - plastids, chloroplasts, apicoplasts; i am not sure of the subtleties of these organelles' rRNA but am willing to learn. Thank you again for testing. Any help appreciated, -- *--Dr Torsten Seemann--Victorian Bioinformatics Consortium, Monash University, AUSTRALIA* *--Life Sciences Computation Centre, VLSCI, Parkville, AUSTRALIA --http://www.bioinformatics.net.au/ * -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 11:16:27 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 10:16:27 -0700 Subject: [maker-devel] Specify multiple files to rmlib Message-ID: Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:33:15 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:33:15 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: It could be done. I've made some changes to the subversion repository if you want to test it. You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 14:42:50 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 14:42:50 -0600 Subject: [maker-devel] FW: protein2genome gene models In-Reply-To: <1400180446764.46375@uga.edu> References: <1400180446764.46375@uga.edu> Message-ID: Upgrade to 2.31.5. Changes since 2.31.3 *a protein2genome issue that was introduced in 2.31.3 was fixed *fasta_merge failing with trnascan results issue was fixed *other_gff input resulting in ARRAY reference being printed was fixed. *naming of tRNA genes was improved to include amino acid identity --Carson From: Sivaranjani Namasivayam Date: Thursday, May 15, 2014 at 1:00 PM To: Carson Holt , Carson Holt , "maker-devel at yandell-lab.org" Subject: RE: [maker-devel] FW: protein2genome gene models Hi Carson, I upgraded to the MAKER version 2.31.3 (from MAKER 2.10). I want to predict gene models directly from proteins. I provided proteins from a related organism as input and set protein2genome to 1. However I do not get any gene models predicted. I also tried this by using a transcriptome data set in addition to the protein dataset and set est2genome and protein2genome to 1. I get gene models from the transcripts but not proteins. When I look at the alignment of the proteins on the genome, they seem to be aligning rather well and I would expect to see a gene model predicted. Would you know why this might be? Also the number of gene models predicted (directly from the transriptome)in this version is lower than the previous version I was using (MAKER 2.10). I did notice this version is not predicting overlapping gene models, but that is not rule. Thanks, Ranjani From: maker-devel on behalf of Carson Holt Sent: Wednesday, April 30, 2014 10:55 AM To: Carson Holt; maker-devel at yandell-lab.org Subject: Re: [maker-devel] FW: protein2genome gene models Make sure you're using the current version of MAKER. It works on eukaryotes as well. --Carson From: Carson Holt Date: Wednesday, April 30, 2014 at 8:53 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] FW: protein2genome gene models From: Sivaranjani Namasivayam Date: Wednesday, April 30, 2014 at 8:45 AM To: "maker-devel-bounces at yandell-lab.org" Subject: protein2genome gene models Hi, I want to examine the gene models predicted diectly from protein data for my genome. MAKER has an option for this in the maker_opts.ctl file: protein2genome =1 , but it says for prokaryotes only. Will this not work for eukaryotes? Is it because of introns? Thanks, Ranjani _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjackman at gmail.com Fri May 16 14:45:59 2014 From: sjackman at gmail.com (Shaun Jackman) Date: Fri, 16 May 2014 13:45:59 -0700 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? --? http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: It could be done. ?I've made some changes to the subversion repository if you want to test it. ?You should also be able to use labels just as you can with other comma separated lists in MAKER using ':' to separate the label. Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label I've also found the other_gff issue. ?It was fixed in the subversion repository but not in the release package I made the other day, so I've updated the release to 2.31.5. --Carson From: Shaun Jackman Reply-To: Shaun Jackman Date: Friday, May 16, 2014 at 11:16 AM To: "maker-devel at yandell-lab.org" Subject: [maker-devel] Specify multiple files to rmlib Hi, Carson. Some options of maker accept multiple files as a comma separated list, but rmlib does not. Could it? Thanks! Shaun P.S. Any update on the fix to other_gff? http://sjackman.ca _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 16 15:02:59 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 16 May 2014 15:02:59 -0600 Subject: [maker-devel] Specify multiple files to rmlib In-Reply-To: References: Message-ID: No. There are some implementation issues related to how repeats are processed and collapsed that may cause hidden bugs with the comma separated list, so it needs some more testing. The label is added to the output GFF3. For example protein=uniprot.fasta:uniprot, would cause the gff3 label to be protein2genome:uniprot rather than just protein2genome. Programs like GBrowse know how to use the labels to generate on/off check boxes to turn just some of your protein results on/off in a viewer rather than all of them. --Carson From: Shaun Jackman Date: Friday, May 16, 2014 at 2:45 PM To: Carson Holt , "maker-devel at yandell-lab.org" Subject: Re: [maker-devel] Specify multiple files to rmlib Excellent. Thanks, Carson. Is the rmlib feature included in 2.31.5? What is the purpose of the label? Does it affect the GFF file output by MAKER? -- http://sjackman.ca On 2014-May-16 at 13:33:23 , Carson Holt (carsonhh at gmail.com) wrote: > It could be done. I've made some changes to the subversion repository if you > want to test it. You should also be able to use labels just as you can with > other comma separated lists in MAKER using ':' to separate the label. > > Example --> rmlib=repeats.fasta:some_label,repeats2.fasta:another_label > > I've also found the other_gff issue. It was fixed in the subversion > repository but not in the release package I made the other day, so I've > updated the release to 2.31.5. > > --Carson > > > > > From: Shaun Jackman > Reply-To: Shaun Jackman > Date: Friday, May 16, 2014 at 11:16 AM > To: "maker-devel at yandell-lab.org" > Subject: [maker-devel] Specify multiple files to rmlib > > Hi, Carson. Some options of maker accept multiple files as a comma separated > list, but rmlib does not. Could it? > > Thanks! > Shaun > > P.S. Any update on the fix to other_gff? > > http://sjackman.ca > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Tue May 20 13:17:14 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 20 May 2014 19:17:14 +0000 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: <520E7E32-B4E2-486F-B730-F15683679440@illinois.edu> I found a problem with some tRNAscan output using MAKER 2.31.5. I had a full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs for. This was then run as follows, with the requisite error: -system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 Nested quantifiers in regex; marked by <-- HERE in m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at /home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, <$IN> line 3067590. The problematic lines: ---------------------------------------------- -system-specific-4.1$ grep "???" Zalbi.all.gff3 KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0 KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA-1 KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0 KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA-1 ---------------------------------------------- I managed to get it going by using the following modifications (regex quotemeta) in map_gff_ids (lines 107-112): for my $id (@map_ids) { # Only if the value (or the portion preceding # the first colon) is equal to the map key. next unless ($value eq $id || $value =~ /^\Q$id\E:/); $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ /\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); } I?m guessing there may be a similar problem with map_fasta_ids? chris From carsonhh at gmail.com Tue May 20 13:43:48 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 13:43:48 -0600 Subject: [maker-devel] tRNAscan and map_gff_ids Message-ID: Thanks. trnascan support is new enough that there are these kinds of issues that we need to find and fix. MAKER tries to use the codon name supplied by trnascan, and it looks like the codon is 'Undet_???'. I don't know why that is. We currently don't do any filtering of trnascan results (i.e. we keep everything). This might be something that we really just want to be filtering out since it doesn't have a determinable codon? At the very least I should change the codon to NNN instead of ??? to correspond to the standard ambiguity nucleotides used in FASTA format. --Carson On 5/20/14, 1:17 PM, "Fields, Christopher J" wrote: >I found a problem with some tRNAscan output using MAKER 2.31.5. I had a >full MAKER data set (run initially using MAKER 2.31.5) that I mapped IDs >for. This was then run as follows, with the requisite error: > >-system-specific-4.1$ map_gff_ids id.map Zalbi.all.gff3 >Nested quantifiers in regex; marked by <-- HERE in >m/trnascan-KB913038.1-noncoding-Undet_??? <-- HERE -gene-79.0/ at >/home/groups/hpcbio/apps/maker/maker-2.31.5/bin/map_gff_ids line 111, ><$IN> line 3067590. > >The problematic lines: > >---------------------------------------------- >-system-specific-4.1$ grep "???" Zalbi.all.gff3 >KB913038.1 maker gene 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene >-79.0 >KB913038.1 maker tRNA 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1;Parent=trnascan-KB913038.1-noncoding-Undet >_???-gene-79.0;Name=trnascan-KB913038.1-noncoding-Undet_???-gene-79.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|70|0 >KB913038.1 maker exon 23847890 23847958 . - . ID=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1:exon:2193;Parent=trnascan-KB913038.1-nonco >ding-Undet_???-gene-79.0-tRNA-1 >KB913039.1 maker gene 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene >-72.0 >KB913039.1 maker tRNA 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1;Parent=trnascan-KB913039.1-noncoding-Undet >_???-gene-72.0;Name=trnascan-KB913039.1-noncoding-Undet_???-gene-72.0-tRNA >-1;_AED=1.00;_eAED=1.00;_QI=0|-1|0|0|-1|0|1|74|0 >KB913039.1 maker exon 21710152 21710224 . - . ID=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1:exon:4036;Parent=trnascan-KB913039.1-nonco >ding-Undet_???-gene-72.0-tRNA-1 >---------------------------------------------- > >I managed to get it going by using the following modifications (regex >quotemeta) in map_gff_ids (lines 107-112): > > for my $id (@map_ids) { > # Only if the value (or the portion preceding > # the first colon) is equal to the map key. > next unless ($value eq $id || $value =~ /^\Q$id\E:/); > $value =~ s/\Q$id\E/$map{$id}/ unless($tag eq 'Name' && $id !~ >/\-gene\-\d+\.\d+|^CG\:|^....\:|^[^\:]+\:temp\d+\:/); > } > >I?m guessing there may be a similar problem with map_fasta_ids? > >chris >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From caigh02 at gmail.com Mon May 19 21:43:18 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Mon, 19 May 2014 23:43:18 -0400 Subject: [maker-devel] Maker exon number Message-ID: Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Tue May 20 14:34:20 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Tue, 20 May 2014 20:34:20 +0000 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Guohong, What version of MAKER are you running? Thanks, Daniel On May 19, 2014, at 9:43 PM, Guohong Cai wrote: > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 20 14:50:44 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 14:50:44 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I can do that. Just a note of caution though. The ID= attribute is not protected (it's just an identifier to relate things to one another for correct parentage). Downstream scripts that use or manipulate GFF3 files can change it (so relying on it to always be the same or even be informative is not guaranteed). --Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Tue May 20 18:52:34 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 20 May 2014 18:52:34 -0600 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: I've gone ahead and made the change in the devlopment version. It will probably be convenient in most cases, but it's important to note one caveat. Exon features are shared in GFF3 format. So if there are multiple isoforms that contain the same exon, there will only be a single exon line in the GFF3, but it will list several transcript IDs in it's Parent= attribute. What does that have to do with with the ID= attribute or exon order? Well it means that ID=exon:2 in the first transcript may be the second exon, but in another transcript ID=exon:2 may be the first exon or third exon, etc. This is because there is only a single line for a given exon and it gets shared by all the transcripts. So it will always have the same ID= tag, but will hold a different position in different isoforms (so it's ordinal value will not go along with the ID in those cases). But since most gene calls from MAKER will have only one isoform (default) it could still be convenient in those cases. Thanks, Carson From: Guohong Cai Date: Monday, May 19, 2014 at 9:43 PM To: Subject: [maker-devel] Maker exon number Hi Carson, I am using MAKER to annotate a few small genomes. When looking through the gff file, I notice that the exon numbers do not start from 0 or 1 for each gene. Only the first gene in a scaffold start with exon 0. If the first gene has 3 exons (0-2), then the second gene will start from exon 3 (an example is shown below). It seems many people would prefer that in each gene, the first exon be exon 1. Is it possible to make such a change? Thanks. Guohong scaffold1 . contig 1 347483 . . . ID=scaffold1;Name=scaffold1 scaffold1 maker gene 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-g ene-0.0 scaffold1 maker mRNA 106 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-pr ocessed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12 ;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 scaffold1 maker exon 106 1684 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 1878 2440 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker exon 2605 2985 . + . ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaff old1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 106 1684 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 1878 2440 . + 2 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker CDS 2605 2985 . + 0 ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold 1-processed-gene-0.0-mRNA-1 scaffold1 maker gene 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 scaffold1 maker mRNA 38466 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0 .254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0| 0|0|0.83|1|1|6|0|892 scaffold1 maker exon 38466 38511 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38616 38742 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 38831 39986 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40073 40154 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40259 40666 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker exon 40745 41604 . + . ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap -gene-0.254-mRNA-1 scaffold1 maker CDS 38466 38511 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38616 38742 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 38831 39986 . + 1 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40073 40154 . + 0 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40259 40666 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 scaffold1 maker CDS 40745 41604 . + 2 ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-ge ne-0.254-mRNA-1 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 07:14:40 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 08:14:40 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Daniel, I am using maker-2.31.5.---Guohong On Tue, May 20, 2014 at 3:34 PM, Daniel Ence wrote: > Hi Guohong, > > What version of MAKER are you running? > > Thanks, > Daniel > > > On May 19, 2014, at 9:43 PM, Guohong Cai > wrote: > > > Hi Carson, > > > > I am using MAKER to annotate a few small genomes. When looking through > the gff file, I notice that the exon numbers do not start from 0 or 1 for > each gene. Only the first gene in a scaffold start with exon 0. If the > first gene has 3 exons (0-2), then the second gene will start from exon 3 > (an example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > > > Guohong > > > > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > > > _______________________________________________ > > maker-devel mailing list > > maker-devel at box290.bluehost.com > > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 08:40:47 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 09:40:47 -0500 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Thanks a lot.---Guohong On Tue, May 20, 2014 at 7:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Wed May 21 21:16:52 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Wed, 21 May 2014 23:16:52 -0400 Subject: [maker-devel] Maker exon number In-Reply-To: References: Message-ID: Hi Carson, is the development version available for download? Only maker2.31.5 is available on Yandell Lab website.---Guohong On Tue, May 20, 2014 at 8:52 PM, Carson Holt wrote: > I've gone ahead and made the change in the devlopment version. It will > probably be convenient in most cases, but it's important to note one > caveat. Exon features are shared in GFF3 format. So if there are multiple > isoforms that contain the same exon, there will only be a single exon line > in the GFF3, but it will list several transcript IDs in it's Parent= > attribute. > > What does that have to do with with the ID= attribute or exon order? Well > it means that ID=exon:2 in the first transcript may be the second exon, but > in another transcript ID=exon:2 may be the first exon or third exon, etc. > This is because there is only a single line for a given exon and it gets > shared by all the transcripts. So it will always have the same ID= tag, > but will hold a different position in different isoforms (so it's ordinal > value will not go along with the ID in those cases). But since most gene > calls from MAKER will have only one isoform (default) it could still be > convenient in those cases. > > Thanks, > Carson > > > From: Guohong Cai > Date: Monday, May 19, 2014 at 9:43 PM > To: > Subject: [maker-devel] Maker exon number > > Hi Carson, > > I am using MAKER to annotate a few small genomes. When looking through the > gff file, I notice that the exon numbers do not start from 0 or 1 for each > gene. Only the first gene in a scaffold start with exon 0. If the first > gene has 3 exons (0-2), then the second gene will start from exon 3 (an > example is shown below). It seems many people would prefer that in each > gene, the first exon be exon 1. Is it possible to make such a change? > Thanks. > > Guohong > > > scaffold1 . contig 1 347483 . . . > ID=scaffold1;Name=scaffold1 > scaffold1 maker gene 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0 > scaffold1 maker mRNA 106 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1;Parent=genemark-scaffold1-processed-gene-0.0;Name=genemark-scaffold1-processed-gene-0.0-mRNA-1;_AED=0.12;_eAED=0.13;_QI=0|0|0|1|1|1|3|0|840 > scaffold1 maker exon 106 1684 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:0;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 1878 2440 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:1;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker exon 2605 2985 . + . > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:exon:2;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 106 1684 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 1878 2440 . + 2 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker CDS 2605 2985 . + 0 > ID=genemark-scaffold1-processed-gene-0.0-mRNA-1:cds;Parent=genemark-scaffold1-processed-gene-0.0-mRNA-1 > scaffold1 maker gene 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254 > scaffold1 maker mRNA 38466 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1;Parent=maker-scaffold1-snap-gene-0.254;Name=maker-scaffold1-snap-gene-0.254-mRNA-1;_AED=0.03;_eAED=0.03;_QI=0|0|0|0.83|1|1|6|0|892 > scaffold1 maker exon 38466 38511 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:3;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38616 38742 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:4;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 38831 39986 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:5;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40073 40154 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:6;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40259 40666 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:7;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker exon 40745 41604 . + . > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:exon:8;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38466 38511 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38616 38742 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 38831 39986 . + 1 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40073 40154 . + 0 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40259 40666 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > scaffold1 maker CDS 40745 41604 . + 2 > ID=maker-scaffold1-snap-gene-0.254-mRNA-1:cds;Parent=maker-scaffold1-snap-gene-0.254-mRNA-1 > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Thu May 22 23:13:37 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Thu, 22 May 2014 22:13:37 -0700 Subject: [maker-devel] Alternative splicing options Message-ID: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Fri May 23 08:55:50 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Fri, 23 May 2014 14:55:50 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 09:07:26 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 09:07:26 -0600 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto wrote: > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know the > ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. I > could not find many discussions in this group, and most genome publications > using Maker are unclear about whether they annotated alternative transcrips, > so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function not > well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged here), > what is the best approach to take, considering I already obtained what I > considered is a solid set of gene models after four rounds of annotation? > Should I start over by turning on alt_splice, and training gene predictors > from those outputs? Or would it be appropriate to simply repeat my latest > round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 09:56:27 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 08:56:27 -0700 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: Hey guys, Great to hear!! I will be anxious to try it out. Thanks for your prompt help! Cheers, Felipe On Fri, May 23, 2014 at 8:07 AM, Carson Holt wrote: > I'd like to add that alternate splice forms will be generated off of the > mutually exclusive EST evidence, so how well it performs as well as whether > or not it can even generates other splice forms will depend entirely on the > quality of your EST evidence. > > --Carson > > > From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options > > Hi Felipe, > > The alternative splice option is full-developed and functional option in > MAKER. What it does is tell MAKER to consider gene models with mutually > exclusive evidence. For example, if there are two models at a locus and > evidence that supports one exon in one model and a different exon in > another model, both those models might make it into the final geneset. > > From the workflow you described, I think you'd have to redo only the > fourth and final round of MAKER annotation. As a general principle for > trying out new options on your annotations, I'd recommend choosing a big > scaffold, running it with alt_splice=1, and seeing how you like the > results. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: > > Hi, all, > > I just finished a fourth and final iterative round with Maker, training > predictors in between, and I am very happy with the results. What I would > like to try now is to annotate alternative splicing variants, and I know > the ctrl file has the alt_splice option. > However, I am intrigued by the lack of information regarding this option. > I could not find many discussions in this group, and most genome > publications using Maker are unclear about whether they annotated > alternative transcrips, so my guess is they didn't. > So I was wondering whether there is a reason for that. Is that function > not well developed in Maker? Should I stay away from it? > > Assuming it is OK to give it a try (provided I don't get discouraged > here), what is the best approach to take, considering I already obtained > what I considered is a solid set of gene models after four rounds of > annotation? Should I start over by turning on alt_splice, and training > gene predictors from those outputs? Or would it be appropriate to simply > repeat my latest round, changing only alt_splice=1? > > > Thanks for any help. I can see the light at the end of the tunnel! > > Felipe > > -- > Felipe Barreto > Post-doctoral Scholar > Scripps Institution of Oceanography > University of California, San Diego > La Jolla, CA 92093 > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjfields at illinois.edu Fri May 23 10:21:38 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 23 May 2014 16:21:38 +0000 Subject: [maker-devel] Alternative splicing options In-Reply-To: References: Message-ID: <14271D2B-4D83-47C9-8661-682599E94E8F@illinois.edu> That is exactly what I have seen using this option; genes with very good transcriptome evidence (as one might expect)tend to have more isoforms. The problem we run into is not having a diverse enough transcriptome set to work with (ours tend to be tissue-specific unfortunately), so we have some genes giving more isoforms than others, but we don?t design the libraries so have no control over it. We are currently only using Trinity assemblies as input over using TopHat2/Cufflinks. chris On May 23, 2014, at 10:07 AM, Carson Holt > wrote: I'd like to add that alternate splice forms will be generated off of the mutually exclusive EST evidence, so how well it performs as well as whether or not it can even generates other splice forms will depend entirely on the quality of your EST evidence. --Carson From: Daniel Ence > Date: Friday, May 23, 2014 at 8:55 AM To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Alternative splicing options Hi Felipe, The alternative splice option is full-developed and functional option in MAKER. What it does is tell MAKER to consider gene models with mutually exclusive evidence. For example, if there are two models at a locus and evidence that supports one exon in one model and a different exon in another model, both those models might make it into the final geneset. >From the workflow you described, I think you'd have to redo only the fourth and final round of MAKER annotation. As a general principle for trying out new options on your annotations, I'd recommend choosing a big scaffold, running it with alt_splice=1, and seeing how you like the results. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 22, 2014, at 10:13 PM, Felipe Barreto > wrote: Hi, all, I just finished a fourth and final iterative round with Maker, training predictors in between, and I am very happy with the results. What I would like to try now is to annotate alternative splicing variants, and I know the ctrl file has the alt_splice option. However, I am intrigued by the lack of information regarding this option. I could not find many discussions in this group, and most genome publications using Maker are unclear about whether they annotated alternative transcrips, so my guess is they didn't. So I was wondering whether there is a reason for that. Is that function not well developed in Maker? Should I stay away from it? Assuming it is OK to give it a try (provided I don't get discouraged here), what is the best approach to take, considering I already obtained what I considered is a solid set of gene models after four rounds of annotation? Should I start over by turning on alt_splice, and training gene predictors from those outputs? Or would it be appropriate to simply repeat my latest round, changing only alt_splice=1? Thanks for any help. I can see the light at the end of the tunnel! Felipe -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Fri May 23 14:31:36 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Fri, 23 May 2014 13:31:36 -0700 Subject: [maker-devel] gff3_merge on models only for SNAP training? Message-ID: Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Fri May 23 14:33:17 2014 From: carsonhh at gmail.com (Carson Holt) Date: Fri, 23 May 2014 14:33:17 -0600 Subject: [maker-devel] gff3_merge on models only for SNAP training? In-Reply-To: References: Message-ID: Yes. It's ok. Non-genic feature lines will be ignored. --Carson From: Felipe Barreto Date: Friday, May 23, 2014 at 2:31 PM To: MAKER group Subject: [maker-devel] gff3_merge on models only for SNAP training? Hi, all, I should have confirmed this well before starting my Maker runs, but better now than never. When generating a merged gff file to be used for SNAP training, is it OK to use the default gff output from gff3_merge, which contains all protein/EST evidence alignments (this is what I did)? Or should I have generated a gene models-only merged gff (using the -g flag) for training? I assume the Maker flag within the larger gff file will allow the subsequent scripts (e.g. maker2zff) to ignore the other alignments, but just wanted to check. Thanks again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacques.dainat at imbim.uu.se Fri May 23 01:56:05 2014 From: jacques.dainat at imbim.uu.se (Jacques Dainat) Date: Fri, 23 May 2014 09:56:05 +0200 Subject: [maker-devel] Possible error in tRNA annotation by maker Message-ID: Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.hoeppner at bils.se Tue May 27 02:12:07 2014 From: marc.hoeppner at bils.se (=?windows-1252?Q?Marc_H=F6ppner?=) Date: Tue, 27 May 2014 10:12:07 +0200 Subject: [maker-devel] Some questions regarding ab-initio training Message-ID: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Hi, I wanted to get some feedback regarding the training of ab-initio gene finders - it?s not strictly Maker related, but I suppose there are many people on this list that have encountered and solved this issue in one way or another. Specifically, I am trying to train Augustus (and possibly SNAP) for a plant genome. This has always been a very frustrating process for me, but while I have a better idea now how to do it, I still don?t get the sort of accuracy that I am hoping for. A quick run-through of my process; Evidence build with maker on level 1 and 2 proteins from Uniprot + Sanger-sequenced EST data Filtered for Models with an AED <= 0.3 Loaded that into WebApollo, together with an existing reference annotation and the evidence tracks Manually curated/selected 750 gene models using the following rules: - Must have start/stop codon - Most have as many exons as possible - Must agree with evidence - Must be >= 2kb part from other gene models (provided as flanking regions for augustus to train intergenic sequence) From these models, I created a GBK file, split it into 650 (train) and 100 (test) models and created a new profile using the documented procedure. But: While the naked ab-init models created through maker get a lot of genes ?sort of right?, I still see too many issues to be really satisfied. Problems include: - random exon calls which are not supported by any line of evidence (~1 per gene model, I would guess) - poor congruency with some gene models (especially ones not used for training/testing) Is there any best-practice guide on how to improve this? The Augustus website is unfortunately quite poor on detail? My impression so far is that ramping up the number of training models isn?t really doing too much beyond a certain point (tried 400, 500 and 750). Regards, Marc Marc P. Hoeppner, PhD Team Leader BILS Genome Annotation Platform Department for Medical Biochemistry and Microbiology Uppsala University, Sweden marc.hoeppner at bils.se From carsonhh at gmail.com Tue May 27 09:25:39 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:25:39 -0600 Subject: [maker-devel] Some questions regarding ab-initio training In-Reply-To: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> References: <1CD4559D-7A9D-4F8C-92F4-F5228F4E23B8@bils.se> Message-ID: Extra exons can be required for predictors to make sense of a region (they do the best they can). This can be due to imperfect assemblies or repeats. For plants the repeat database is the the one thing that will most affect the annotation quality. You may need to spend some time building the best repeat library you can. The repeat library is the next most important thing next to training the predictor, because they confuse the predictor (sometimes a lot) causing it to behave oddly in those regions (because repeats do encode real protein and protein fragments). Also when running now with MAKER make sure to include the entire proteome of a related species and not just UniProt, and you will get better performance. Now that you have Augustus trained, using it inside of MAKER with an improved repeat library and additional protein evidence should give it the feedback that will allow it to perform better than it would with just naked ab initio prediction. Thanks, Carson On 5/27/14, 2:12 AM, "Marc H?ppner" wrote: >Hi, > >I wanted to get some feedback regarding the training of ab-initio gene >finders - it?s not strictly Maker related, but I suppose there are many >people on this list that have encountered and solved this issue in one >way or another. > >Specifically, I am trying to train Augustus (and possibly SNAP) for a >plant genome. This has always been a very frustrating process for me, but >while I have a better idea now how to do it, I still don?t get the sort >of accuracy that I am hoping for. A quick run-through of my process; > >Evidence build with maker on level 1 and 2 proteins from Uniprot + >Sanger-sequenced EST data > >Filtered for Models with an AED <= 0.3 > >Loaded that into WebApollo, together with an existing reference >annotation and the evidence tracks > >Manually curated/selected 750 gene models using the following rules: >- Must have start/stop codon >- Most have as many exons as possible >- Must agree with evidence >- Must be >= 2kb part from other gene models (provided as flanking >regions for augustus to train intergenic sequence) > >From these models, I created a GBK file, split it into 650 (train) and >100 (test) models and created a new profile using the documented >procedure. > >But: > >While the naked ab-init models created through maker get a lot of genes >?sort of right?, I still see too many issues to be really satisfied. >Problems include: > >- random exon calls which are not supported by any line of evidence (~1 >per gene model, I would guess) >- poor congruency with some gene models (especially ones not used for >training/testing) > >Is there any best-practice guide on how to improve this? The Augustus >website is unfortunately quite poor on detail? My impression so far is >that ramping up the number of training models isn?t really doing too much >beyond a certain point (tried 400, 500 and 750). > >Regards, > >Marc > > >Marc P. Hoeppner, PhD >Team Leader >BILS Genome Annotation Platform >Department for Medical Biochemistry and Microbiology >Uppsala University, Sweden >marc.hoeppner at bils.se > > >_______________________________________________ >maker-devel mailing list >maker-devel at box290.bluehost.com >http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org From carsonhh at gmail.com Tue May 27 09:26:25 2014 From: carsonhh at gmail.com (Carson Holt) Date: Tue, 27 May 2014 09:26:25 -0600 Subject: [maker-devel] Possible error in tRNA annotation by maker In-Reply-To: References: Message-ID: Do you have a small test contig I could use to duplicate the error? That will make it easier to fix. Thanks, Carson From: Jacques Dainat Date: Friday, May 23, 2014 at 1:56 AM To: Subject: [maker-devel] Possible error in tRNA annotation by maker Hi, I would like to submit a possible error that occurs by using the tRNA annotation by maker. I saw the problem in the gff result file. The problem occurs in only and for all the tRNA who have an intron and that are in the + strand. Indeed, in this case the strand of one of the exon seems to be wrong (see the example below). As exemple we have: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2185 2230 . - . XXX scaffold6501 maker exon 2126 2163 . + . XXX Theoretically, we should obtain: scaffold6501 maker gene 2126 2230 . + . XXX scaffold6501 maker tRNA 2126 2230 . + . XXX scaffold6501 maker exon 2126 2163 . + . XXX scaffold6501 maker exon 2185 2230 . + . XXX kind regards, Jacques Dainat, PhD BILS (Bioinformatics Infrastructure for Life Sciences) Adress: (room E10:3312) Uppsala University, BMC Department of Medical Biochemistry Microbiology, Genomics Husargatan 3, box 582 S-75123 Uppsala Sweden _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 01:28:14 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 09:28:14 +0200 Subject: [maker-devel] Problem with installation Message-ID: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 02:26:08 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 10:26:08 +0200 Subject: [maker-devel] General question Message-ID: I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? Thanks, Panos -------------- next part -------------- An HTML attachment was scrubbed... URL: From b.cantarel at gmail.com Wed May 28 05:11:18 2014 From: b.cantarel at gmail.com (Brandi Cantarel) Date: Wed, 28 May 2014 06:11:18 -0500 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Maker's predictions are improved with evidence. These proteins can be from uniprot (I recommend uniprot50) or from a closely related taxa. Maker uses comparisons to these proteins in its prediction. There is more detail on this in the paper. Sent from my iPhone > On May 28, 2014, at 3:26, Panos Ioannidis wrote: > > I'm going through the Maker tutorial and saw that among the input files you give it, there's a fasta file with proteins (the protein=xxx parameter in the maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. proteins) and also annotates them. Does it only do annotation of already predicted genes/proteins? But then, why is it using gene predictors like Augustus, SNAP, etc? > > Thanks, > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 05:29:43 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 13:29:43 +0200 Subject: [maker-devel] General question In-Reply-To: References: Message-ID: Thanks Brandi. On Wed, May 28, 2014 at 1:11 PM, Brandi Cantarel wrote: > Maker's predictions are improved with evidence. These proteins can be > from uniprot (I recommend uniprot50) or from a closely related taxa. > > Maker uses comparisons to these proteins in its prediction. There is more > detail on this in the paper. > > Sent from my iPhone > > On May 28, 2014, at 3:26, Panos Ioannidis > wrote: > > I'm going through the Maker tutorial and saw that among the input files > you give it, there's a fasta file with proteins (the protein=xxxparameter in the > maker_opts.ctl file). > > What exactly are these proteins? I thought Maker both predicts genes (i.e. > proteins) and also annotates them. Does it only do annotation of already > predicted genes/proteins? But then, why is it using gene predictors like > Augustus, SNAP, etc? > > Thanks, > Panos > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 07:29:58 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 13:29:58 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 07:46:12 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 15:46:12 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence wrote: > Hi Panos, When you go to the src directory and type "./Build status", > what message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or > error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 08:03:33 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 14:03:33 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, So I just tried the commands that you used on my install of maker, and it took a surprisingly long time for the error messages to print. The test that we use in the tutorials (it seems to run faster than running maker with -h or with no options) is maker -CTL, which will create control files that you use to set the many options for maker. Try running ./maker -CTL and let me know whether it creates those files. I guess that it might take more or less time, depending on your machine. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 7:46 AM, Panos Ioannidis > wrote: Hi Daniel, Here's the output of ./Build status ============================================================================== STATUS MAKER v2.31.4 ============================================================================== PERL Dependencies: VERIFIED External Programs: VERIFIED External C Libraries: VERIFIED MPI SUPPORT: DISABLED MWAS Web Interface: DISABLED MAKER PACKAGE: CONFIGURATION OK I think everything looks okay, right? On Wed, May 28, 2014 at 3:29 PM, Daniel Ence > wrote: Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 08:32:07 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 08:32:07 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis Cc: "" Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis wrote: > Hello Maker community, > > I just finished installing Maker and even though everything seems to be okay, > when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:13:05 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:13:05 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one > thing that happens when you first use a new module or script Is that the > interpreter follows the dependency tree validating that everything > executes/loads correctly. Since you installed a number of dependencies and > MAKER itself, the first time you launch MAKER Perl has to do this check on > the dependency tree. This only happens the first time, and after that Perl > remembers it already ran the check so the dependencies and MAKER will just > start from then on. Normally this proccess takes less than 30 seconds; > however, on some systems (especially clusters) there may a heavy IO burden > and this process can take a while. For example does it take a moment for > 'ls -al' to return in some directories rather than returning > instantaneously like it is supposed to? If it takes 3 seconds to return or > example, then each dependency check may take up to 3 seconds. If you just > installed a bunch of new perl modules then there may be a hundred or more > dependencies that may have to be validated for the first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > > Hello Maker community, > > I just finished installing Maker and even though everything seems to be > okay, when I give > > ./maker -h > > or > > ./maker > > the program apparently hangs without giving any output or warning or error. > > Just so you know, I have installed all dependencies (Perl libraries and > third-party programs) and am executing from bin/, not src/bin/. > > Any ideas? > > Panos > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:15:20 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:15:20 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:16:58 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:16:58 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: You may also want to look into if you need to reinstall perl on another drive. --Carson From: Carson Holt Date: Wednesday, May 28, 2014 at 10:15 AM To: Panos Ioannidis Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 seconds per query which is why you should do the 'ls -al' test), it can take several minutes because it's an IO issue. --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Wed May 28 10:25:04 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Wed, 28 May 2014 18:25:04 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: "ls -al" is instantaneous in all directories... I'll try installing it on my workstation, although it's not possible to do annotation on my machine! And the machine I currently have installed it, is our server and I can't really make any big changes there. Anyway, I'll let you know how it goes. P On Wed, May 28, 2014 at 6:16 PM, Carson Holt wrote: > You may also want to look into if you need to reinstall perl on another > drive. > > --Carson > > > From: Carson Holt > Date: Wednesday, May 28, 2014 at 10:15 AM > To: Panos Ioannidis > > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Normally it takes 30 seconds, but if your IO response is slow (I.e. 3 > seconds per query which is why you should do the 'ls -al' test), it can > take several minutes because it's an IO issue. > > --Carson > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 10:28:30 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 10:28:30 -0600 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt Cc: Daniel Ence , "" Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > Perl is a scripting language rather than a compiled language, and one thing > that happens when you first use a new module or script Is that the interpreter > follows the dependency tree validating that everything executes/loads > correctly. Since you installed a number of dependencies and MAKER itself, the > first time you launch MAKER Perl has to do this check on the dependency tree. > This only happens the first time, and after that Perl remembers it already ran > the check so the dependencies and MAKER will just start from then on. > Normally this proccess takes less than 30 seconds; however, on some systems > (especially clusters) there may a heavy IO burden and this process can take a > while. For example does it take a moment for 'ls -al' to return in some > directories rather than returning instantaneously like it is supposed to? If > it takes 3 seconds to return or example, then each dependency check may take > up to 3 seconds. If you just installed a bunch of new perl modules then there > may be a hundred or more dependencies that may have to be validated for the > first time. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM > To: Panos Ioannidis > Cc: "" > Subject: Re: [maker-devel] Problem with installation > > Hi Panos, When you go to the src directory and type "./Build status", what > message do you get? Also, what version of maker are you running? > > Thanks, > Daniel > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: > >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be okay, >> when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/mak > er-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fbarreto at ucsd.edu Wed May 28 11:39:45 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 10:39:45 -0700 Subject: [maker-devel] Adding non-overlapping models to final set Message-ID: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Wed May 28 12:35:06 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Wed, 28 May 2014 18:35:06 +0000 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: Message-ID: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: Hi, all, I finished generating Maker gene models. Following suggestions here and from publications, I used IPRscan on the set of non-ovelapping ab initio protein models. This identified ~200 models with protein domains, and I would like to add those to my final gene set. However, I am having trouble figuring out how to use Maker's options to update my final maker_genome.gff file to include these 200 models, without also adding the remaining ~8000 non-overlapping models I don't want. The discussions about the re-annotation options don't seem to get at this. Do I have to first find a way to create a new gff file containing only the 200 new models, and then simply use gff3_merge with the full genome gff? At this point, I am not concerned about incorporating IPRscan functional info into the gff file. I want simply to generate an updated (and final) gene set and then move on to functional annotation. Thanks yet again! Felipe _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Wed May 28 12:45:05 2014 From: carsonhh at gmail.com (Carson Holt) Date: Wed, 28 May 2014 12:45:05 -0600 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: For convenience you can use the attached script to help pull out the match/match_part features you want from the GFF3 file (or you can pull them out yourself). Then do just like Daniel said by setting keep_preds=1 and giving the selected match/match_part features to pred_gf, and your current MAKER models to model_gff. --Carson From: Daniel Ence Date: Wednesday, May 28, 2014 at 12:35 PM To: Felipe Barreto Cc: MAKER group Subject: Re: [maker-devel] Adding non-overlapping models to final set Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If you don't care about getting the functional information from the IPRscan report and into the gff file, then you just need to pull those predictions out from all the ab-initio predictions that you don't care about and put them in a fasta file. Then you put that file in for the "pred_gff" option and set keep_preds=1. That will promote those predictions to full gene models. Then you can merge with your other gff3 file. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 11:39 AM, Felipe Barreto wrote: > Hi, all, > > I finished generating Maker gene models. Following suggestions here and from > publications, I used IPRscan on the set of non-ovelapping ab initio protein > models. This identified ~200 models with protein domains, and I would like to > add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to update > my final maker_genome.gff file to include these 200 models, without also > adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the 200 > new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional info > into the gff file. I want simply to generate an updated (and final) gene set > and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gff3_select Type: application/octet-stream Size: 3237 bytes Desc: not available URL: From fbarreto at ucsd.edu Wed May 28 14:28:48 2014 From: fbarreto at ucsd.edu (Felipe Barreto) Date: Wed, 28 May 2014 13:28:48 -0700 Subject: [maker-devel] Adding non-overlapping models to final set In-Reply-To: References: <4F6CDFA8-99A3-4D84-882A-C90BA521EEAC@genetics.utah.edu> Message-ID: Awesome! Thanks for the tips and script. This should do the trick. Will come back if I get stuck. Felipe On Wed, May 28, 2014 at 11:45 AM, Carson Holt wrote: > For convenience you can use the attached script to help pull out the > match/match_part features you want from the GFF3 file (or you can pull them > out yourself). Then do just like Daniel said by setting keep_preds=1 and > giving the selected match/match_part features to pred_gf, and your current > MAKER models to model_gff. > > --Carson > > > > From: Daniel Ence > Date: Wednesday, May 28, 2014 at 12:35 PM > To: Felipe Barreto > Cc: MAKER group > Subject: Re: [maker-devel] Adding non-overlapping models to final set > > Hi Felipe, I'm glad to hear that you got some more genes from IPRscan. If > you don't care about getting the functional information from the IPRscan > report and into the gff file, then you just need to pull those predictions > out from all the ab-initio predictions that you don't care about and put > them in a fasta file. Then you put that file in for the "pred_gff" option > and set keep_preds=1. That will promote those predictions to full gene > models. Then you can merge with your other gff3 file. > > ~Daniel > > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 28, 2014, at 11:39 AM, Felipe Barreto > wrote: > > Hi, all, > > I finished generating Maker gene models. Following suggestions here and > from publications, I used IPRscan on the set of non-ovelapping ab initio > protein models. This identified ~200 models with protein domains, and I > would like to add those to my final gene set. > > However, I am having trouble figuring out how to use Maker's options to > update my final maker_genome.gff file to include these 200 models, without > also adding the remaining ~8000 non-overlapping models I don't want. The > discussions about the re-annotation options don't seem to get at this. > > Do I have to first find a way to create a new gff file containing only the > 200 new models, and then simply use gff3_merge with the full genome gff? > > At this point, I am not concerned about incorporating IPRscan functional > info into the gff file. I want simply to generate an updated (and final) > gene set and then move on to functional annotation. > > > Thanks yet again! > > Felipe > > > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -- Felipe Barreto Post-doctoral Scholar Scripps Institution of Oceanography University of California, San Diego La Jolla, CA 92093 -------------- next part -------------- An HTML attachment was scrubbed... URL: From panos.ioannidis at gmail.com Thu May 29 03:21:24 2014 From: panos.ioannidis at gmail.com (Panos Ioannidis) Date: Thu, 29 May 2014 11:21:24 +0200 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt wrote: > Try perlbrew to set up yor own local version of perl just for your user. > http://perlbrew.pl > > --Carson > > > From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM > To: Carson Holt > Cc: Daniel Ence , "" > > > Subject: Re: [maker-devel] Problem with installation > > Hello Daniel and Carson, > > Thank you both for your comments. > > Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes > but still nothing happens. > > Daniel, the same is true for maker -CTL; it appears as if it's doing > something, but if you give a top you'll see that the CPU usage is ALWAYS > 0%. > > Three things that might be helpful: > 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " > 2. when I give top I see that there are two maker processes running. Is > this normal? > 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - > I guess that's memory) for one of the two maker processes go to zero, but > it doesn't go away. > > > > > > On Wed, May 28, 2014 at 4:32 PM, Carson Holt wrote: > >> Perl is a scripting language rather than a compiled language, and one >> thing that happens when you first use a new module or script Is that the >> interpreter follows the dependency tree validating that everything >> executes/loads correctly. Since you installed a number of dependencies and >> MAKER itself, the first time you launch MAKER Perl has to do this check on >> the dependency tree. This only happens the first time, and after that Perl >> remembers it already ran the check so the dependencies and MAKER will just >> start from then on. Normally this proccess takes less than 30 seconds; >> however, on some systems (especially clusters) there may a heavy IO burden >> and this process can take a while. For example does it take a moment for >> 'ls -al' to return in some directories rather than returning >> instantaneously like it is supposed to? If it takes 3 seconds to return or >> example, then each dependency check may take up to 3 seconds. If you just >> installed a bunch of new perl modules then there may be a hundred or more >> dependencies that may have to be validated for the first time. >> >> --Carson >> >> >> >> From: Daniel Ence >> Date: Wednesday, May 28, 2014 at 7:29 AM >> To: Panos Ioannidis >> Cc: "" >> Subject: Re: [maker-devel] Problem with installation >> >> Hi Panos, When you go to the src directory and type "./Build status", >> what message do you get? Also, what version of maker are you running? >> >> Thanks, >> Daniel >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 28, 2014, at 1:28 AM, Panos Ioannidis >> wrote: >> >> Hello Maker community, >> >> I just finished installing Maker and even though everything seems to be >> okay, when I give >> >> ./maker -h >> >> or >> >> ./maker >> >> the program apparently hangs without giving any output or warning or >> error. >> >> Just so you know, I have installed all dependencies (Perl libraries and >> third-party programs) and am executing from bin/, not src/bin/. >> >> Any ideas? >> >> Panos >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> _______________________________________________ maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 08:58:22 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 14:58:22 +0000 Subject: [maker-devel] Problem with installation In-Reply-To: References: Message-ID: Hi Panos, The results are stored in the datastore directory in the "maker.output" directory. You can merge those results into one gff file with the gff3_merge accessory script. It's included in the bin directory. ~Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 3:21 AM, Panos Ioannidis > wrote: So I managed to install it on my workstation and it works fine! Thanks for the information on perlbrew. I will also give it a try. I did a test run on my workstation using just a few contigs and was wondering where the annotation is saved. Is it the gff files (one gff per contig) in the *.maker.output/ directory? On Wed, May 28, 2014 at 6:28 PM, Carson Holt > wrote: Try perlbrew to set up yor own local version of perl just for your user. http://perlbrew.pl --Carson From: Panos Ioannidis > Date: Wednesday, May 28, 2014 at 10:13 AM To: Carson Holt > Cc: Daniel Ence >, ">" > Subject: Re: [maker-devel] Problem with installation Hello Daniel and Carson, Thank you both for your comments. Carson, I gave it a lot more than 30 seconds. I gave it about 5 minutes but still nothing happens. Daniel, the same is true for maker -CTL; it appears as if it's doing something, but if you give a top you'll see that the CPU usage is ALWAYS 0%. Three things that might be helpful: 1. Ctrl-C doesn't work for killing maker; you have to give "kill -9 " 2. when I give top I see that there are two maker processes running. Is this normal? 3. When I press Ctrl-C, the resources in top (labeled VIRT, RES and SHR - I guess that's memory) for one of the two maker processes go to zero, but it doesn't go away. On Wed, May 28, 2014 at 4:32 PM, Carson Holt > wrote: Perl is a scripting language rather than a compiled language, and one thing that happens when you first use a new module or script Is that the interpreter follows the dependency tree validating that everything executes/loads correctly. Since you installed a number of dependencies and MAKER itself, the first time you launch MAKER Perl has to do this check on the dependency tree. This only happens the first time, and after that Perl remembers it already ran the check so the dependencies and MAKER will just start from then on. Normally this proccess takes less than 30 seconds; however, on some systems (especially clusters) there may a heavy IO burden and this process can take a while. For example does it take a moment for 'ls -al' to return in some directories rather than returning instantaneously like it is supposed to? If it takes 3 seconds to return or example, then each dependency check may take up to 3 seconds. If you just installed a bunch of new perl modules then there may be a hundred or more dependencies that may have to be validated for the first time. --Carson From: Daniel Ence > Date: Wednesday, May 28, 2014 at 7:29 AM To: Panos Ioannidis > Cc: ">" > Subject: Re: [maker-devel] Problem with installation Hi Panos, When you go to the src directory and type "./Build status", what message do you get? Also, what version of maker are you running? Thanks, Daniel Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 28, 2014, at 1:28 AM, Panos Ioannidis > wrote: Hello Maker community, I just finished installing Maker and even though everything seems to be okay, when I give ./maker -h or ./maker the program apparently hangs without giving any output or warning or error. Just so you know, I have installed all dependencies (Perl libraries and third-party programs) and am executing from bin/, not src/bin/. Any ideas? Panos _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.comhttp://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Thu May 29 13:15:39 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Thu, 29 May 2014 15:15:39 -0400 Subject: [maker-devel] maker gene order in gff output Message-ID: Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:37:24 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:37:24 -0400 Subject: [maker-devel] Question about 'keep_pred' setting Message-ID: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From dence at genetics.utah.edu Thu May 29 14:44:28 2014 From: dence at genetics.utah.edu (Daniel Ence) Date: Thu, 29 May 2014 20:44:28 +0000 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: Message-ID: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Hi Daniel, Your interpretation of the code is correct. keep_preds is a binary setting. There's been some discussion behind-the-scenes about making it more flexible, but that hasn't been implemented yet. We need to fix what it says in the control file. Daniel Ence Graduate Student dence at genetics.utah.edu Eccles Institute of Human Genetics University of Utah 15 North 2030 East, Room 2100 Salt Lake City, UT 84112-5330 On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: Good afternoon! I have a quick question about the keep_pred setting in Maker. In older versions of Maker, this was a binary value indicating whether unsupported predictions should be kept. I'm now using Maker 2.31.3, where it's described as a scaled value indicating a "concordance threshold" for unsupported predictions. As far as I can tell from the code, however, it's still treated in the same way as before. Could you briefly describe the motivation for this setting and the intended (although possibly incomplete) change in its functionality in new versions of Maker? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 14:47:47 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 16:47:47 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary > setting. There's been some discussion behind-the-scenes about making it > more flexible, but that hasn't been implemented yet. We need to fix what it > says in the control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > > Good afternoon! > > I have a quick question about the keep_pred setting in Maker. In older > versions of Maker, this was a binary value indicating whether unsupported > predictions should be kept. I'm now using Maker 2.31.3, where it's > described as a scaled value indicating a "concordance threshold" for > unsupported predictions. As far as I can tell from the code, however, it's > still treated in the same way as before. > > Could you briefly describe the motivation for this setting and the > intended (although possibly incomplete) change in its functionality in new > versions of Maker? > > Thanks! > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > _______________________________________________ > maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 15:43:35 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 15:43:35 -0600 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: There is a hidden score called abAED that measures concordance among the ab initio gene predictors . The idea was to have ab initio models that are the same across multiple ab initio predictor be kept if they're group concordance is high enough, then drop ab initio predictions that only happen in one ab initio predictor. Currently the option is all or nothing, the threshold would give a more fine grained control of keeping just some unsupported predictions. --Carson From: Daniel Standage Date: Thursday, May 29, 2014 at 2:47 PM To: Daniel Ence Cc: Maker Mailing List Subject: Re: [maker-devel] Question about 'keep_pred' setting Thanks. Just curious: how would the intended behavior differ if keep_pred was set to, say, 0.5, instead of 0 or 1? -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 4:44 PM, Daniel Ence wrote: > Hi Daniel, > > Your interpretation of the code is correct. keep_preds is a binary setting. > There's been some discussion behind-the-scenes about making it more flexible, > but that hasn't been implemented yet. We need to fix what it says in the > control file. > > > Daniel Ence > Graduate Student > dence at genetics.utah.edu > Eccles Institute of Human Genetics > University of Utah > 15 North 2030 East, Room 2100 > Salt Lake City, UT 84112-5330 > > On May 29, 2014, at 2:37 PM, Daniel Standage > wrote: > >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's described >> as a scaled value indicating a "concordance threshold" for unsupported >> predictions. As far as I can tell from the code, however, it's still treated >> in the same way as before. >> >> Could you briefly describe the motivation for this setting and the intended >> (although possibly incomplete) change in its functionality in new versions of >> Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Thu May 29 16:29:39 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Thu, 29 May 2014 18:29:39 -0400 Subject: [maker-devel] Question about 'keep_pred' setting In-Reply-To: References: <4D18DA6B-C625-4FA9-8E11-FB7CC0DB7CCA@genetics.utah.edu> Message-ID: Ah, that makes sense. Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University On Thu, May 29, 2014 at 5:43 PM, Carson Holt wrote: > There is a hidden score called abAED that measures concordance among the > ab initio gene predictors . The idea was to have ab initio models that are > the same across multiple ab initio predictor be kept if they're group > concordance is high enough, then drop ab initio predictions that only > happen in one ab initio predictor. Currently the option is all or nothing, > the threshold would give a more fine grained control of keeping just some > unsupported predictions. > > --Carson > > > From: Daniel Standage > Date: Thursday, May 29, 2014 at 2:47 PM > To: Daniel Ence > Cc: Maker Mailing List > Subject: Re: [maker-devel] Question about 'keep_pred' setting > > Thanks. > > Just curious: how would the intended behavior differ if keep_pred was set > to, say, 0.5, instead of 0 or 1? > > > -- > Daniel S. Standage > Ph.D. Candidate > Computational Genome Science Laboratory > Indiana University > > > On Thu, May 29, 2014 at 4:44 PM, Daniel Ence > wrote: > >> Hi Daniel, >> >> Your interpretation of the code is correct. keep_preds is a binary >> setting. There's been some discussion behind-the-scenes about making it >> more flexible, but that hasn't been implemented yet. We need to fix what it >> says in the control file. >> >> >> Daniel Ence >> Graduate Student >> dence at genetics.utah.edu >> Eccles Institute of Human Genetics >> University of Utah >> 15 North 2030 East, Room 2100 >> Salt Lake City, UT 84112-5330 >> >> On May 29, 2014, at 2:37 PM, Daniel Standage >> wrote: >> >> Good afternoon! >> >> I have a quick question about the keep_pred setting in Maker. In older >> versions of Maker, this was a binary value indicating whether unsupported >> predictions should be kept. I'm now using Maker 2.31.3, where it's >> described as a scaled value indicating a "concordance threshold" for >> unsupported predictions. As far as I can tell from the code, however, it's >> still treated in the same way as before. >> >> Could you briefly describe the motivation for this setting and the >> intended (although possibly incomplete) change in its functionality in new >> versions of Maker? >> >> Thanks! >> >> -- >> Daniel S. Standage >> Ph.D. Candidate >> Computational Genome Science Laboratory >> Indiana University >> _______________________________________________ >> maker-devel mailing list >> maker-devel at box290.bluehost.com >> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org >> >> >> > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsonhh at gmail.com Thu May 29 21:11:11 2014 From: carsonhh at gmail.com (Carson Holt) Date: Thu, 29 May 2014 21:11:11 -0600 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: The maker_map_ids script that comes with MAKER can be used to generate new names of the style PREFIX###### or PREFIX_######. You can use the --sort_order flag to sort the contigs in whatever your preferred order is before generating the new names. Then use the map_gff_ids and map_fasta_ids to change the names in the gff3 and fasta files respectively. Here is some extra information from a tutorial where the maker_map_ids script is used --> http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_G MOD_Online_Training_2014#Post_Processing_of_Annotations --Carson From: Guohong Cai Date: Thursday, May 29, 2014 at 1:15 PM To: "" Subject: [maker-devel] maker gene order in gff output Hi Carson, In the maker output, the genes have names like "genemark-scaffold17- processed-gene-0.0". Many users probably will eventually give the genes different names, such as GSGxxx (Genus Species Gene #). In the gff output, the scaffolds are not in order (either numerical order or the order of input assembly). On the same scaffold, the genes are not listed in order either. This will make it a little harder for users to change the gene IDs. We may name the genes in order from scaffold 1 to scaffold N, and and each scaffold, order the genes from left to right, e.g GSG00001, GSG00002). Do you think you can order the genes in the gff output? For example, order the scaffolds according to the input genome assembly, and on each scaffold, order the genes from 5' to 3'. Thanks. Guohong Rutgers University _______________________________________________ maker-devel mailing list maker-devel at box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From caigh02 at gmail.com Fri May 30 05:40:17 2014 From: caigh02 at gmail.com (Guohong Cai) Date: Fri, 30 May 2014 06:40:17 -0500 Subject: [maker-devel] maker gene order in gff output In-Reply-To: References: Message-ID: Great????Guohong On Thu, May 29, 2014 at 10:11 PM, Carson Holt wrote: > The maker_map_ids script that comes with MAKER can be used to generate new > names of the style PREFIX###### or PREFIX_######. You can use > the --sort_order flag to sort the contigs in whatever your preferred order > is before generating the new names. > > Then use the map_gff_ids and map_fasta_ids to change the names in the > gff3 and fasta files respectively. > > Here is some extra information from a tutorial where the maker_map_ids > script is used --> > http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014#Post_Processing_of_Annotations > > --Carson > > > From: Guohong Cai > Date: Thursday, May 29, 2014 at 1:15 PM > To: "" > Subject: [maker-devel] maker gene order in gff output > > Hi Carson, > > In the maker output, the genes have names like "genemark-scaffold17- > processed-gene-0.0". Many users probably will eventually give the genes > different names, such as GSGxxx (Genus Species Gene #). > > In the gff output, the scaffolds are not in order (either numerical order > or the order of input assembly). On the same scaffold, the genes are not > listed in order either. This will make it a little harder for users to > change the gene IDs. We may name the genes in order from scaffold 1 to > scaffold N, and and each scaffold, order the genes from left to right, e.g > GSG00001, GSG00002). Do you think you can order the genes in the gff > output? For example, order the scaffolds according to the input genome > assembly, and on each scaffold, order the genes from 5' to 3'. > > Thanks. > > Guohong > Rutgers University > _______________________________________________ maker-devel mailing list > maker-devel at box290.bluehost.com > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.standage at gmail.com Sat May 31 09:23:23 2014 From: daniel.standage at gmail.com (Daniel Standage) Date: Sat, 31 May 2014 11:23:23 -0400 Subject: [maker-devel] Precomputed alignments Message-ID: Hello again! About a year ago I asked about using precomputed alignments with Maker. The thread quickly took a different direction as we tried to track down other issues, and I never got the thread back on its original track. So, to return to the original question, what exactly is required when providing pre-computed alignments in GFF3 format? For example, does it affect Maker's behavior whether a score is given? The "Target" attribute? The "Gap" attribute? Thanks! -- Daniel S. Standage Ph.D. Candidate Computational Genome Science Laboratory Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: